Week 8 Machine Learning
Unsupervised learning
- Data does not have labels (y values)
 - Clustering is a way to find structures with in data sets.
 - Applications of clustering
    
- Market segmentation
 - Astronomy
 - Computer clusters
 - Social network analysis.
 
 
K-means algorithm

Fine tuning k-means
- Use randomly chosen inputs from input data set as initial centroids.
 - Run k means multiple times with different randomly chosen centroids to reduce chance of getting stuck in local optima.
 - Use elbow method to find optimum number of clusters. Although elbow method is quite ineffective for most real world problems.
 - Deciding number of clusters requires human insight into the problem domain.
 
Dimensionality reduction
- Useful to reduce number of features to 2-3 to aid in visualizing data. May not necessarily be used to write machine learning algorithms.
 - Compression - to save disk space, specially with computer vision.
 - Speed up learning algorithm by reducing number of features without losing variance.
 
Don’t use PCA to solve overfitting.
Principal Component analysis

How to choose number of Principal components
