Posts

Ensemble

Boosting: In lieu of training all the models separately as in bagging, boosting trains models sequentially. Each new model is trained to correct the errors made by the previous ones. The first tree is examined and the weights of those observations that are hard to classify are increased and the weights for those that are easy to classify are reduced. This modified data is used to build the next tree. This process is repeated for a defined number of iterations. Predictions of the final ensemble model is therefore the weighted sum of the predictions made by the previous tree models. GBM uses loss functions Since each tree is fit to residuals as against the original output parameter, each tree is small and improves prediction in the parts where prediction is bad. All the models might make the same mistake in the standard ensemble method. Compute error by deducting forecasted value from target value (e1= y - y1_forecasted) Build a new model on errors (e1_forecasted) as target variabl...

Regularization

Regularisation is used to constrain the model to fewer degrees of freedom or regularise beta estimates towards zero in order to avoid overfitting. Hence it avoids a complex or flexible model. Regularisation reduces the variance of the model considerably without substantial increase in its bias. λ is the tuning parameter used to penalise the flexibility of the model.  As the value of λ increases, it reduces the value of beta estimates except for intercept and thus reducing the variance. However beyond a certain threshold, the bias starts increasing as the model starts losing important information resulting in under fitting. Note that Regularization adds penalty to the higher terms and their importance reduces. Lasso uses modulus of Beta estimates to penalize and this is known as L1 norm while Ridge uses squares of Beta estimates to penalize and this is known as L2 norm. Lasso can penalise some of the beta estimates to be equal to zero when λ is large enough resulting in featu...

Bias-Variance tradeoff

Bias refers to the deviation of the predicted values from the correct value. The error occurs when you make wrong assumptions about data. In other words, it is the error that is created when you represent a real-life complex problem using a simpler model while it might be making them easier to understand. For instance, when you build a linear model to solve for a non-linear problem. It results in under fitting and makes them less flexible. Parametric algos like Linear Regression can produce high bias while non-parametric algos like Decision Trees make good assumptions about the training data and target function and hence do not have high bias. Variance refers to the change that occurs when the model is applied on a different training data. It occurs when the model captures not just the underlying pattern but noise as well. It results in overfitting. In other words, it is memorising the data. It is often observed in Decision Trees. When the observations are limited but the...

Bayes Theorem Origins

In his book 'An Enquiry concerning Human Understanding'  David Hume posited that inherently fallible evidence is insufficient proof against natural laws - eyewitness testimony can’t prove a miracle. Bayes, a Presbytarian minister, motivated to rebut him was interested in understanding how much evidence would we need to be persuaded that something is a probability regardless of how improbable it is. He developed an equation that focuses on updating our beliefs with new evidence. His work, An Essay towards solving a Problem in the Doctrine of Chances, was reviewed by Richard Price after Bayes' death who believed that Bayes’ Theorem helped prove the existence of God. When we are presented with new information, we can use Bayes’ Theorem to refine our pre-existing belief.  It is fairly easy to determine the probability of the effect given a cause. Since Bayes’ Theorem works in the reverse direction, many find it complicated.
Image
The green line has a negative Beta and the black one has a positive beta for logistic regression.

AI

we are now at a critical juncture where many of the systems we need to master are fiendishly complex, from climate change to macroeconomic issues to Alzheimer’s disease. The problem is that these challenges are so complex that even the world’s top scientists, clinicians and engineers can struggle to master all the intricacies necessary to make the breakthroughs required. It has been said that Leonardo da Vinci was perhaps the last person to have lived who understood the entire breadth of knowledge of their age. Since then we’ve had to specialise, and today it takes a lifetime to completely master even a single field such as astrophysics or quantum mechanics. The systems we now seek to understand are underpinned by a vast amount of data, usually highly dynamic, non-linear and with emergent properties that make it incredibly hard to find the structure and connections to reveal the insights hidden therein.  Kepler and Newton could write equations to describe the motion of p...

GLM

In  statistics , the  generalized linear model  ( GLM ) is a flexible generalization of ordinary linear regression  that allows for response variables that have other than a  normal distribution . The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a  link function  and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.  in many cases when the response variable must be positive and can vary over a wide scale, constant input changes lead to geometrically varying rather than constantly varying output changes