8

I am attempting calculate the rolling auto-correlation for a Series object using Pandas (0.23.3)

Setting up the example:

dt_index = pd.date_range('2018-01-01','2018-02-01', freq = 'B')
data = np.random.rand(len(dt_index))
s = pd.Series(data, index = dt_index)

Creating a Rolling object with window size = 5:

r = s.rolling(5)

Getting:

Rolling [window=5,center=False,axis=0]

Now when I try to calculate the correlation (Pretty sure this is the wrong approach):

r.corr(other=r)

I get only NaNs

I tried another approach based on the documentation::

df = pd.DataFrame()
df['a'] = s
df['b'] = s.shift(-1)
df.rolling(window=5).corr()

Getting something like:

...
2018-03-01 a NaN NaN
           b NaN NaN

Really not sure where I'm going wrong with this. Any help would be immensely appreciated! The docs use float64 as well. Thinking it's because the correlation is very close to zero and so it's showing NaN? Somebody had raised a bug report here, but jreback solved the problem in a previous bug fix I think.

This is another relevant answer, but it's using pd.rolling_apply, which does not seem to be supported in Pandas version 0.23.3?

Abhay Nainan
  • 3,794
  • 2
  • 14
  • 14
  • `2018-01-01 NaN 2018-01-02 NaN 2018-01-03 NaN 2018-01-04 NaN 2018-01-05 1.0 2018-01-08 1.0 2018-01-09 1.0 2018-01-10 1.0 2018-01-11 1.0 2018-01-12 1.0 2018-01-15 1.0 2018-01-16 1.0 2018-01-17 1.0 2018-01-18 1.0 2018-01-19 1.0 2018-01-22 1.0 2018-01-23 1.0 2018-01-24 1.0 2018-01-25 1.0 2018-01-26 1.0 2018-01-29 1.0 2018-01-30 1.0 2018-01-31 1.0 2018-02-01 1.0` I tried your first approach and only first 4 values are nan and it makes sense as your window size is 5 and minimum 5 elements are required? – Vikas NS Jul 21 '18 at 10:30
  • @vikasns I'm a bit suspicious of the fact that values returned are all 1s. Don't have any mathematical proof at hand, but values drawn from a uniform random distribution being perfectly autocorrelated is very unintuitive to me. Could you perhaps post your implementation as an answer ? – Abhay Nainan Jul 21 '18 at 15:10
  • Have you tried **s.rolling(5).apply(pd.Series.autocorr)**? – Stefano Giannini Oct 04 '22 at 17:17

2 Answers2

12

IIUC,

>>> s.rolling(5).apply(lambda x: x.autocorr(), raw=False)

2018-01-01         NaN
2018-01-02         NaN
2018-01-03         NaN
2018-01-04         NaN
2018-01-05   -0.502455
2018-01-08   -0.072132
2018-01-09   -0.216756
2018-01-10   -0.090358
2018-01-11   -0.928272
2018-01-12   -0.754725
2018-01-15   -0.822256
2018-01-16   -0.941788
2018-01-17   -0.765803
2018-01-18   -0.680472
2018-01-19   -0.902443
2018-01-22   -0.796185
2018-01-23   -0.691141
2018-01-24   -0.427208
2018-01-25    0.176668
2018-01-26    0.016166
2018-01-29   -0.876047
2018-01-30   -0.905765
2018-01-31   -0.859755
2018-02-01   -0.795077
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • Many thanks. Any idea why using the rolling_obj.corr(r) isn't working ? Could you explain your understanding of the other and pairwise arguments ? – Abhay Nainan Jul 21 '18 at 15:12
  • 1
    Because you'd be doing `corr` between a window of 5 objects and a whole series. Not exactly what you intended, isn't it ;) – rafaelc Jul 21 '18 at 18:27
  • 3
    is there a faster way? this takes a bit long to run when the dataframe is 100k long. thanks – user1234440 Mar 11 '19 at 21:18
  • 1
    ```df.rolling(5).apply(lambda x: pd.Series(x).autocorr())``` works too – moue Sep 01 '19 at 00:33
2

This is a lot faster than Pandas' autocorr but the results are different. In my dataset, there is a 0.87 Pearson correlation between the results of those two methods. There is a discussion about why the results are different here.

from statsmodels.tsa.stattools import acf
s.rolling(5).apply(lambda x: acf(x, unbiased=True, fft=False)[1], raw=True)

Note that the input cannot have null values, otherwise it will return all nulls.

BrunoF
  • 3,239
  • 26
  • 39