3

I used autocorrelation_plot to plot the autocorrelation of a straight line:

import numpy as np
import pandas as pd
from pandas.plotting import autocorrelation_plot
import matplotlib.pyplot as plt

dr = pd.date_range(start='1984-01-01', end='1984-12-31')

df = pd.DataFrame(np.arange(len(dr)), index=dr, columns=["Values"])
autocorrelation_plot(df)
plt.show()

enter image description here

Then, I tried using autocorr() to calculate the autocorrelation with different lags:

for i in range(0,366):
    print(df['Values'].autocorr(lag=i))

The output is 1 (or 0.99) for all the lag. But it is clear from the correlogram that the autocorrelation is a curve rather than a straight line fixed at 1.

Did I interpret the correlogram incorrectly or did I use the autocorr() function incorrectly?

halfer
  • 19,824
  • 17
  • 99
  • 186
Cheng
  • 16,824
  • 23
  • 74
  • 104

1 Answers1

3

You are using both functions correctly, but... Autocorrelation_plot uses a different way of calculating autocorrelations then autocorr() does.

The following two posts explain more about the differences. Unfortunately I don't know which way of calculating is the correct way:

What's the difference between pandas ACF and statsmodel ACF?

Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

If you need it, you can get the autocorrelations out of your autocorrelation plot as follows:

ax = autocorrelation_plot(df)
ax.lines[5].get_data()[1]
Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96
  • Thanks for the links. From what I can tell, `autocorr()` called `np.corrcoef()` which calculates the pearson correlation, which is different from autocomplete correlation. The implementation of `autocorrelation_plot` is correct. I have submitted [an issue on github](https://github.com/pandas-dev/pandas/issues/24608). – Cheng Jan 04 '19 at 08:54