Unsupervised learning

  • Data does not have labels (y values)
  • Clustering is a way to find structures with in data sets.
  • Applications of clustering
    • Market segmentation
    • Astronomy
    • Computer clusters
    • Social network analysis.
K-means algorithm

Fine tuning k-means
  • Use randomly chosen inputs from input data set as initial centroids.
  • Run k means multiple times with different randomly chosen centroids to reduce chance of getting stuck in local optima.
  • Use elbow method to find optimum number of clusters. Although elbow method is quite ineffective for most real world problems.
  • Deciding number of clusters requires human insight into the problem domain.
Dimensionality reduction
  • Useful to reduce number of features to 2-3 to aid in visualizing data. May not necessarily be used to write machine learning algorithms.
  • Compression - to save disk space, specially with computer vision.
  • Speed up learning algorithm by reducing number of features without losing variance.

Don’t use PCA to solve overfitting.

Principal Component analysis

How to choose number of Principal components

Choosing k