How to choose a value for the maxlag parameter in Python statsmodels adfuller?

Question

I have monthly data about clicks on websites and want to build a SARIMA model to predict the next month's expected clicks. Because a SARIMA model needs to work with stationary data, I transformed the data and carried out the Augmented Dickey Fuller Test in Python in order to detect, when I can stop transforming it and start feeding it to the model (that would be the case when the p-value<0.05).

Since the data is seasonal, do I need to set the maxlag parameter in adfuller() to 12 and why / why not?

I carried out the adfuller-test in both versions:

the default maxlag
and maxlag=12

Of course I receive different results for the p-value:

myTimeSeries.plot()
adfuller(myTimeSeries) # p=0.113872
adfuller(myTimeSeries, maxlag=12) # p=0.996884

myLog = numpy.log(myTimeSeries) #log-transfor
myLog.plot()
adfuller(myLog) # p=0.165395
adfuller(myLog, maxlag=12) # p=0.997394

myDiff = myLog.diff(1) #difference with lag 1
myDiff.plot()
myDiff = myDiff.dropna()
adfuller(myDiff) # p=0.003884
adfuller(myDiff, maxlag=12) # p=0.613816

mySeasonalDiff = myDiff.diff(12) #seasonal differencing with lag 12
mySeasonalDiff.plot()
mySeasonalDiff = mySeasonalDiff.dropna()
adfuller(mySeasonalDiff) # p=0.000000
adfuller(mySeasonalDiff, maxlag=12) # p=0.958532

plot of myTimeSeries

plot of myLog

plot of myDiff

plot of mySeasonalDiff

Looks like if I have to set maxlag=12, I need further transformation of my data, whereas if I can use the default maxlag, I can stop after taking the log and first difference. So I would like to know, how to use the ADF-Test properly.

Thanks for your help.

did you try with default value None? #from Greene referencing Schwert 1989 maxlag = int(np.ceil(12. * np.power(nobs / 100., 1 / 4.))) — Bargitta, Dec 10 '18 at 13:06
@Bargitta : Yes, I did try the default value. In the code you see in the question I tried the default value each time I call the function adfuller(data) without setting the maxlag parameter. The difference in p when using the default value vs. using maxlag=12 is what my question is about. — LBoss, Feb 18 '19 at 08:48

How to choose a value for the maxlag parameter in Python statsmodels adfuller?

0 Answers0