17

I am struggling to understand the concept of p-value and the various other results of adfuller test.

The code I am using:

(I found this code in Stack Overflow)

import numpy as np
import os
import pandas as pd
import statsmodels.api as sm
import cython

import statsmodels.tsa.stattools as ts

loc = r"C:\Stock Study\Stock Research\Hist Data"
os.chdir(loc)
xl_file1 = pd.ExcelFile("HDFCBANK.xlsx")
xl_file2 = pd.ExcelFile("KOTAKBANK.xlsx")
y1 = xl_file1.parse("Sheet1")
x1 = xl_file2.parse("Sheet1")

x = x1['Close']
y = y1['Close']


def cointegration_test(y, x):
    # Step 1: regress on variable on the other
    ols_result = sm.OLS(y, x).fit()
    # Step 2: obtain the residual (ols_resuld.resid)
    # Step 3: apply Augmented Dickey-Fuller test to see whether
    #        the residual is unit root
    return ts.adfuller(ols_result.resid)

The output:

(-1.8481210964862593, 0.35684591783869046, 0, 1954, {'10%': -2.5675580437891359, '1%': -3.4337010293693235, '5%': -2.863020285222162}, 21029.870846458849)

If I understand the test correctly:

Value
adf : float Test statistic
pvalue : float MacKinnon’s approximate p-value based on MacKinnon (1994, 2010)
usedlag : int Number of lags used
nobs : int Number of observations used for the ADF regression and calculation of the critical values
critical values : dict Critical values for the test statistic at the 1 %, 5 %, and 10 % levels. Based on MacKinnon (2010)
icbest : float The maximized information criterion if autolag is not None.
resstore : ResultStore, optional

I am unable to completely understand the results and was hoping someone would be willing to explain them in layman's language. All the explanations I am finding are very technical.

My interpretation is: they are cointegrated, i.e. we failed to disprove the null hypothesis(i.e. unit root exists). Confidence levels are the % numbers.

Am I completely wrong?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Sid
  • 3,749
  • 7
  • 29
  • 62
  • 2
    I’m voting to close this question because it is not about programming as defined in the [help] but about stats theory and/or methodology - please see https://stackoverflow.com/tags/statistics/info – desertnaut Dec 26 '21 at 11:44

3 Answers3

26

Null hypothesis: Non Stationarity exists in the series.

Alternative Hypothesis: Stationarity exists in the series

Data: (-1.8481210964862593, 0.35684591783869046, 0, 1954, {'10%': -2.5675580437891359, 
'1%': -3.4337010293693235, '5%': -2.863020285222162}, 21029.870846458849)

Lets break data one by one.

First data point: -1.8481210964862593: Critical value of the data in your case

Second data point: 0.35684591783869046: Probability that null hypothesis will not be rejected(p-value)

Third data point: 0: Number of lags used in regression to determine t-statistic. So there are no auto correlations going back to '0' periods here.

Forth data point: 1954: Number of observations used in the analysis.

Fifth data point: {'10%': -2.5675580437891359, '1%': -3.4337010293693235, '5%': -2.863020285222162}: T values corresponding to adfuller test.

Since critical value -1.8>-2.5,-3.4,-2.8 (t-values at 1%,5%and 10% confidence intervals), null hypothesis cannot be rejected. So there is non stationarity in your data

Also p-value of 0.35>0.05(if we take 5% significance level or 95% confidence interval), null hypothesis cannot be rejected.

Hence data is non stationary (that means it has relation with time)

noob
  • 3,601
  • 6
  • 27
  • 73
  • 1
    Sorry to revive an old post, but if the test value is lower than the 1%,5% and 10% critical value and also the p_value is lower than 5%, is it possible (or normal) to obtain that the number of lags used is 0? – Pedro Pablo Severin Honorato Sep 15 '20 at 15:51
9

what you stated in your question is correct. Once you applied the Adfuller test over your OLS regression residue, you were checking whether your residue had any heterocedasticity, in another words, if your residue was stationary.

Since your adfuller p-value is lower than a certain specified alpha (i.e.: 5%), then you may reject the null hypothesis (Ho), because the probability of getting a p-value as low as that by mere luck (random chance) is very unlikely.

Once the Ho is rejected, the alternative hypothesis (Ha) can be accepted, which in this case would be: the residue series is stationary.

Here is the hypothesis relation for you:

Ho: the series is not stationary, it presents heterocedasticity. In another words, your residue depends on itself (i.e.: yt depends on yt-1, yt-1 depends on yt-2 ..., and so on)

Ha: the series is stationary (That is normally what we desire in regression analysis). Nothing more is needed to be done.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Philipe Riskalla Leal
  • 954
  • 1
  • 10
  • 28
8

The typical way to reject the null hypothesis would be that your t-test result -1.84 is less than all critical values (1%, 5%, 10%). But in this case it's not less than any of your critical values.

jtlz2
  • 7,700
  • 9
  • 64
  • 114
antonio_zeus
  • 477
  • 2
  • 11
  • 21
  • 2
    Sorry, maybe it's a misunderstanding, correct me if I'm wrong. But in fact t_value is lower than the critical values, so the credibility we give to the unit root test is small. The series should be non-stationary: (i) p_value(0.35) > 0.05, and (ii) t_value is lower than the critical values – Izaskun Feb 11 '20 at 10:30
  • what happens if t-test result -1.84 is less than all critical values which means non stationarity but the p-value is <0.05 which means stationarity? i am facing this issue and they contraddict each other – Luigi87 Jul 23 '21 at 13:33