Question about autocorrelation_plot result vs autocorr result

Question

I used autocorrelation_plot to plot the autocorrelation of a straight line:

import numpy as np
import pandas as pd
from pandas.plotting import autocorrelation_plot
import matplotlib.pyplot as plt

dr = pd.date_range(start='1984-01-01', end='1984-12-31')

df = pd.DataFrame(np.arange(len(dr)), index=dr, columns=["Values"])
autocorrelation_plot(df)
plt.show()

Then, I tried using autocorr() to calculate the autocorrelation with different lags:

for i in range(0,366):
    print(df['Values'].autocorr(lag=i))

The output is 1 (or 0.99) for all the lag. But it is clear from the correlogram that the autocorrelation is a curve rather than a straight line fixed at 1.

Did I interpret the correlogram incorrectly or did I use the autocorr() function incorrectly?

Sander van den Oord · Answer 1 · 2019-01-03T12:50:28.780

3

You are using both functions correctly, but... Autocorrelation_plot uses a different way of calculating autocorrelations then autocorr() does.

The following two posts explain more about the differences. Unfortunately I don't know which way of calculating is the correct way:

What's the difference between pandas ACF and statsmodel ACF?

Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

If you need it, you can get the autocorrelations out of your autocorrelation plot as follows:

ax = autocorrelation_plot(df)
ax.lines[5].get_data()[1]

edited Jan 03 '19 at 12:50

answered Jan 03 '19 at 12:43

Sander van den Oord

10,986
5
51
96

Thanks for the links. From what I can tell, `autocorr()` called `np.corrcoef()` which calculates the pearson correlation, which is different from autocomplete correlation. The implementation of `autocorrelation_plot` is correct. I have submitted [an issue on github](https://github.com/pandas-dev/pandas/issues/24608). – Cheng Jan 04 '19 at 08:54

Question about autocorrelation_plot result vs autocorr result

1 Answers1