Bias-Variance tradeoff


Bias refers to the deviation of the predicted values from the correct value. The error occurs when you
make wrong assumptions about data. In other words, it is the error that is created when you represent a
real-life complex problem using a simpler model while it might be making them easier to understand.
For instance, when you build a linear model to solve for a non-linear problem. It results in under fitting
and makes them less flexible. Parametric algos like Linear Regression can produce high bias while
non-parametric algos like Decision Trees make good assumptions about the training data and target
function and hence do not have high bias.

Variance refers to the change that occurs when the model is applied on a different training
data. It occurs when the model captures not just the underlying pattern but noise as well. It results
in overfitting. In other words, it is memorising the data. It is often observed in Decision Trees.
When the observations are limited but the number of parameters are high, it leads to multicollinearity
resulting in high variance. Also when we don't limit maximum depth, the tree can keep growing until
there is a leaf node for every observation.

Put differently, bias refers to how much the data is ignored and variance refers to how dependent is the
model is on data

Adding more parameters to the model results in increased complexity leading to increased variance
and reduced bias. Reducing the number of parameters results in simpler models leading to higher bias.

Optimally complex model is one where reduction in one results in equivalent increase in the other
which is what Data Scientists strive for.



Comments

Popular posts from this blog

Ensemble

AI