Week 9 Machine Learning
Anomaly detection
Applications
- Fraud detection in e-commerce
- Manufacturing - help in quality assurance by finding anomalous components.
- Monitoring computers in data centers
Algorithm
Choosing parameters
- The algorithms works best if the feature has gaussian distribution.
- It will still work if distribution is not gaussian.
- Features can be transformed to so that their distribution looks more gaussian.
- For example
- Feature x can be transformed using log(x), or log(x + c), or sqrt(x) etc.
- Do error analysis. If p(x) is similar for normal and anomalous examples, then need to find new features which can improve the algorithm.
- Choose features which take on very small or very large values in the event of an anomaly.
Multivariate gaussian distribution
Original gaussian | Multivariate gaussian |
---|---|
Manually create features to capture anomalies, based on given raw set of features. | Automatically captures correlations between features |
Computationally cheaper | Expensive to compute, does not scale well with number of features, due to calculating matrix inverse. |
Works even in small training set. | Number of training examples has to be greater than number of features. Should be m > 10n |
Recommender systems
Applications
- Movie, book recommendations.
- Shopping recommendations.
Content based recommender systems
- Essentially a deviation of linear regression.
- We find prediction parameters, θ, for each user in the system.
- Use above parameters to predict which movies the user will like.
- Requires availability of features based on content a movie such as degree of action, romance; which difficult to find in real world.