# Week 2 Machine Learning

### Multivariate linear gradient

##### hypothesis function

- Multiple features, single out come variables.
- Definition of hypothesis function
- h
_{θ}(x) = θ_{0}+ θ_{1}x_{1}+ θ_{2}x_{2}… + θ_{n}x_{n} - x
_{0 = 1} - h
_{θ}(x) = θ_{0}**x**+ θ_{0}_{1}x_{1}+ θ_{2}x_{2}… + θ_{n}x_{n} - x = (N+1) vector, θ = (N+1) vetor
- h
_{θ}(x) = θ^{T}x

- h

##### Multivariate linear regression

##### Feature scaling

- If ranges for various features vary quite a bit, leads to skewed chart, leading to slow rate of convergence.
- To speed up speed of convergence scale all features so that ranges for all are between 0 to 1.
- Example - Feature 1 = potential values 0 to 10000. ScaledFeature 1 = (Feature 1)/10000 = potential values 0 to 1.
- If features can have negative values then potential values will be -1 to 1

##### Mean normalization

- All features having similar mean (zero) helps with rate of convergence.
- Normalize all features so that the mean is at zero.

##### Formula to do feature scaling and mean normalization

x

_{i}= (x_{i}-μ_{i})/S_{i}

- &mu
_{i}= average value of the feature. - S
_{i}= Max Value - Min Value for the feature - Features a range of -0.5 to 0.5

##### Practical tips for gradient descent

- Best way to debug if gradient descent is working is by plotting J(θ) for number of iterations.
- Learning rate too small = slow convergence
- Learning rate too high = J(θ) may not converge.
- Try α = 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1…

##### Polynomial regression

##### Normal Equation

θ = (X^{T}X)^{-1}X^{T}y

Gradient descent | Normal Equation |
---|---|

Need to choose learning rate | Dont need to choose learning rate |

Needs many iterations | No need to iterate |

O(kn)^{2} |
O(n^{3}) to calculate X^{T}X |

Works well even when n in large | Slow if n is very large |