- Data does not have labels (y values)
- Clustering is a way to find structures with in data sets.
- Applications of clustering
- Market segmentation
- Computer clusters
- Social network analysis.
Fine tuning k-means
- Use randomly chosen inputs from input data set as initial centroids.
- Run k means multiple times with different randomly chosen centroids to reduce chance of getting stuck in local optima.
- Use elbow method to find optimum number of clusters. Although elbow method is quite ineffective for most real world problems.
- Deciding number of clusters requires human insight into the problem domain.
- Useful to reduce number of features to 2-3 to aid in visualizing data. May not necessarily be used to write machine learning algorithms.
- Compression - to save disk space, specially with computer vision.
- Speed up learning algorithm by reducing number of features without losing variance.
Don’t use PCA to solve overfitting.
Principal Component analysis
How to choose number of Principal components