- Fraud detection in e-commerce
- Manufacturing - help in quality assurance by finding anomalous components.
- Monitoring computers in data centers
- The algorithms works best if the feature has gaussian distribution.
- It will still work if distribution is not gaussian.
- Features can be transformed to so that their distribution looks more gaussian.
- For example
- Feature x can be transformed using log(x), or log(x + c), or sqrt(x) etc.
- Do error analysis. If p(x) is similar for normal and anomalous examples, then need to find new features which can improve the algorithm.
- Choose features which take on very small or very large values in the event of an anomaly.
Multivariate gaussian distribution
|Original gaussian||Multivariate gaussian|
|Manually create features to capture anomalies, based on given raw set of features.||Automatically captures correlations between features|
|Computationally cheaper||Expensive to compute, does not scale well with number of features, due to calculating matrix inverse.|
|Works even in small training set.||Number of training examples has to be greater than number of features. Should be m > 10n|
- Movie, book recommendations.
- Shopping recommendations.
Content based recommender systems
- Essentially a deviation of linear regression.
- We find prediction parameters, θ, for each user in the system.
- Use above parameters to predict which movies the user will like.
- Requires availability of features based on content a movie such as degree of action, romance; which difficult to find in real world.