Regularization
Regularisation is used to constrain the model to fewer degrees of freedom or regularise beta estimates towards zero in order to avoid overfitting. Hence it avoids a complex or flexible model. Regularisation reduces the variance of the model considerably without substantial increase in its bias. λ is the tuning parameter used to penalise the flexibility of the model. As the value of λ increases, it reduces the value of beta estimates except for intercept and thus reducing the variance. However beyond a certain threshold, the bias starts increasing as the model starts losing important information resulting in under fitting. Note that Regularization adds penalty to the higher terms and their importance reduces.
Lasso uses modulus of Beta estimates to penalize and this is known as L1 norm while Ridge uses squares of Beta estimates to penalize and this is known as L2 norm.
Lasso can penalise some of the beta estimates to be equal to zero when λ is large enough resulting in feature selection. By contrast, Ridge only shrinks the beta close to zero but never zero and hence all variables are retained in the model.
Comments