Granger causality test


The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another.

According to Granger causality, if a signal X1 "Granger-causes" (or "G-causes") a signal X2, then past values of X1 should contain information that helps predict X2 above and beyond the information contained in past values of X2 alone

Suppose that we have three terms, Xt , Yt , and Wt , and that we first attempt to forecast Xt+1 using past terms of Xt and Wt . We then try to forecast Xt+1 using past terms of Xt , Yt , and Wt . If the second forecast is found to be more successful, according to standard cost functions, then the past of Y appears to contain information helping in forecasting Xt+1 that is not in past Xt or Wt . In particular, Wt could be a vector of possible explanatory variables. Thus, Yt would "Granger cause" Xt+1 if (a) Yt occurs before Xt+1 ; and (b) it contains information useful in forecasting Xt+1 that is not found in a group of other appropriate variables.

Berck, Peter, Jennifer Brown, Jeffrey Perloff and Sofia Villas-Boas. 2008. Sales: Tests of Theories of Causality and Timing. International Journal of Industrial Organization.(26): 1257-1273. shows that  Despite sale patterns not being significantly different for national brands and private label brands, formal Granger causality analysis shows a sale of a national brand to be more likely to "cause" sales of other products than a sale of a private label product.

In another research study, Granger causality test shows that, in the long term, housing price growth will drive and have a substantial impact on the growth of retail sales of consumer goods


To identify if sales volume is an important driver of Distribution following regression is done. Here, for unrestricted model DV
is Distribution and IDVs are lag( distribution) i.e. dist_1 and lag( volume) i.e. vol_1 and for restricted model IDV is 
lag(distribution)
Unrestricted model
dist = dist_lag1 + vol_lag1 + e1
Restricted model
dist = dist_lag1 + e0
where e1 , e0  are residuals
The residuals from the above equation are used to calculate the F statistic and further the p- value which helps to infer 
whether the results are significant or not .
Similarly, to identify whether :

Distribution drives Volume 
 For Unrestricted model DV is volume and IDVs are lag ( distribution ) and lag( volume)
 Restricted model IDV is lag(volume)

Media Spend drives Distribution
 For Unrestricted model DV is distribution and IDVs are lag ( distribution ) and lag( media spend) which is raw GRP
Restricted model IDV is lag(distribution)

Promotion drives Distribution
For Unrestricted model DV is distribution and IDVs are lag ( distribution ) and lag( promotion) 
Restricted model IDV is lag(distribution)

Comments

Popular posts from this blog

Ensemble

Bias-Variance tradeoff

AI