Advice on machine learning
If your machine learning algorithm is not working well, you can try following things.
- Get more training examples.
- Try smaller set of features.
- Additional features.
- Adding polynomial features.
- Decreasing or increasing λ - the regularization parameter.
Machine learning diagnostic.
A test you can run to gain insight what is/isn’t working with a learning algorithm, and gain guidance as to how best to improve its performance.
- Use part of training examples as test set.
- Typically random 30% of training examples are used as test set.
- If hypothesis overfits - then error in training set will be low, but high in test set.
- Degree of polynomial
- Linear, quadratic, cubic.. etc.
- Keep 20% of training examples for cross validation set.
- 60% training set, 20% cross validation set, 20% test set.
High bias or variance problem.
- Bias problem - training set and cross validation set errors both will be high.
- Variance problem - training set error is low, but cross validation errors are high.
- Plot average squared error for training set size.
- Plot both average squared error for cross validation set (JCV</sup>) and training set Jtrain.
- As the training set gets larger, the error for a quadratic function increases.
- The error value will plateau out after a certain m, or training set size.
- High bias - JCV</sup> and Jtrain both will be high, will converge for sufficiently large value of training set size. More data will not help.
- High variance - Jtrain « JCV</sup>, will not converge. More data will likely help.
- Smaller networks prone to under fitting.
- Larger networks prone to over fitting, but can be solved using regularization.
- Number of hidden layers is similar to degree of polynomial and same techniques can be used to optimize that.
Machine Learning System Design
- How to spend your time to reduce error of your algorithm?
- Collect lots of data.
- Develop sophisticated features.
- Error analysis
- Start with a simple algorithm.
- Plot learning curves to decide whether you need more features or data.
- Eye ball data where algorithm is going wrong.
- Use a single numerical evaluation metric for your algorithm - error rate or something like that.
- This evaluation will helps with deciding whether your changes are useful or not.
Skewed classes in classification algorithm.
- Data may be skewed to contain far more number of examples for certain class.
- For example - far more samples (99%) of patients who dont have cancer, compared to small sample of patient who do.
- Throwing out data to reducing skew is incorrect, as skew is representative of real world.
- Using error metric will not be very useful to determine performance of such algorithm.
- In classification, you can change threshold from 0.5 to higher or lower values to force your algorithm to predict positive class only when very confident.
- Precision(P) - true positives / (true positives + false positives)
- What percent of predicted positive results were correct.
- Good algorithm will have closer to 1.
- Recall(R) - true positives / (true positives + false negative)
- What percent of positive results were predicted correctly
- Good algorithm will have closer to 1.
- Trade off between Precision and Recall is a hard and depends on context.
- Trying out different algorithms and comparing P and R for them is a good to determine which algorithm is better.
- F score - 2 * (P * R)/(P + R) - Higher the value the better.
- This score is used to compare performance of various algorithm.
Data for machine learning
- Very large data set makes even an “inferior algorithm” work better, almost at par with better algorithms.
- Although having large data set is not sufficient.
- Is the data useful and contains necessary information to write a good enough algorithm?
- Can a human expert predict output confidently based on example and set of features?
- For example - you can’t predict housing price based on data which contains only size of the house as information, irrespective of how many examples you have.