# Week 8 Machine Learning

### Unsupervised learning

- Data does not have labels (y values)
- Clustering is a way to find structures with in data sets.
- Applications of clustering
- Market segmentation
- Astronomy
- Computer clusters
- Social network analysis.

##### K-means algorithm

##### Fine tuning k-means

- Use randomly chosen inputs from input data set as initial centroids.
- Run k means multiple times with different randomly chosen centroids to reduce chance of getting stuck in local optima.
- Use elbow method to find optimum number of clusters. Although elbow method is quite ineffective for most real world problems.
- Deciding number of clusters requires human insight into the problem domain.

##### Dimensionality reduction

- Useful to reduce number of features to 2-3 to aid in visualizing data. May not necessarily be used to write machine learning algorithms.
- Compression - to save disk space, specially with computer vision.
- Speed up learning algorithm by reducing number of features without losing variance.

Don’t use PCA to solve overfitting.