Logistic regression does not make many of the key assumptions of linear regression and general linear models that are based on ordinary least squares algorithms – particularly regarding linearity, normality, homoscedasticity, and measurement level.


Besides that, one of the assumptions of regression is that the variance of Y is constant across values of X (homoscedasticity), which cannot be the case with a binary variable, because its variance is p(1-p). Suppose 50 percent of the people are 1s, then the variance of .25 would be its maximum value. As we move to more extreme values, the variance decreases, for example when p =.10, the variance is .1*.9 = .09, so as p approaches 1 or 0, the variance approaches 0.
 

Firstly, it does not need a linear relationship between the dependent and independent variables.  Logistic regression can handle all sorts of relationships, because applies a non-linear log transformation to the predicted odds ratio

Secondly, the independent variables do not need to be multivariate normal – although multivariate normality yields a more stable solution.  Also the error terms (the residuals) do not need to be multivariate normally distributed. 

 Thirdly, homoscedasticity is not needed. Logistic regression does not need variances can be heteroscedastic for each level of the independent variables.

  Lastly, it can handle ordinal and nominal data as independent variables.  The independent variables do not need to be metric (interval or ratio scaled).


Logistic regression is similar to the Discriminant Analysis.  Discriminant analysis uses the regression line to split a sample in two groups along the levels of the dependent variable.  Whereas the logistic regression analysis uses the concept of probabilities and log odds with cut-off probability 0.5, the discriminant analysis cuts the geometrical plane that is represented by the scatter cloud.  The practical difference is in the assumptions of both tests.

Thirdly, the model should be fitted correctly.  Neither over fitting nor under fitting should occur.  That is only the meaningful variables should be included, but also all meaningful variables should be included.  A good approach to ensure this is to use a stepwise method to estimate the logistic regression.

Fourthly, the error terms need to be independent.  Logistic regression requires each observation to be independent.  That is that the data-points should not be from any dependent samples design, e.g., before-after measurements, or matched pairings.  Also the model should have little or no multicollinearity.  That is that the independent variables should be independent from each other.  However, there is the option to include interaction effects of categorical variables in the analysis and the model.  If multicollinearity is present centering the variables might fix, i.e. deducting the mean of each variable.  If this does not lower the multicollinearity a factor analysis with orthogonally rotated factors should be done before the logistic regression is estimated.


Fifthly, logistic regression assumes linearity of independent variables and log odds.  Whilst it does not require the dependent and independent variables to be related linearly, it requires that the independent variables are linearly related to the log odds.  Otherwise the test underestimates the strength of the relationship and rejects the relationship to easily, that is being not significant (not rejecting the null hypothesis) where it should be significant.


 Sometimes instead of a logit model for logistic regression a probit model is used.  The following graph shows the difference for a logit and a probit model for different values (-4,4).  Both models are commonly used in logistic regression, in most cases a model is fitted with both functions and the function with the better fit is chosen.  However, probit assumes normal distribution of the probability of the event, when logit assumes the log distribution.  Thus the difference between logit and probit is typically seen in small samples.

This method is used to predict the odd ratio for the dependent variable.  In OLS estimation, we minimize the error sum of the square distance, but in maximum likelihood estimation, we maximize the log likelihood.

Significance test: Hosmer and Lemeshow chi-square test is used to test the overall model of goodness-of-fit test.  It is the modified chi-square test, which is better than the traditional chi-square test.


OLS assumes that the distribution should be normally distributed, but in logistic regression, the distribution may be normal, passion or binominal.

Comments

Popular posts from this blog

Ensemble

Bias-Variance tradeoff

AI