0

I'm testing some autocorrelation implementations found in SO. I came across two great answers by Joe Kington and unubtu. Both are very similar, except for the normalization used.

The former uses a max value, and the latter the variance. The results are quite different for some random uniform data as seen below.

import numpy as np
from statsmodels.tsa.stattools import acf
import matplotlib.pyplot as plt

def acorr_unutbu(x):
    x = x - x.mean()
    autocorr = np.correlate(x, x, mode='full')[-x.size:]
    # Normalization
    autocorr /= (x.var() * (np.arange(x.size, 0, -1)))
    return autocorr

def acorr_joe(x):
    x = x - x.mean()
    # The original answer uses [x.size:], I changed it to [-x.size:] to match
    # the size of the other function
    autocorr = np.correlate(x, x, mode='full')[-x.size:]
    # Normalization
    autocorr /= autocorr.max()
    return autocorr

N = 1000
data = np.random.rand(N)

ac_joe = acorr_joe(data)
ac_unubtu = acorr_unutbu(data)

fig, axes = plt.subplots(nrows=2)
axes[0].plot(ac_joe, label="joe")
axes[0].legend()
axes[1].plot(ac_unubtu, c='r', label="unutbu")
axes[1].legend()
plt.show()

enter image description here

I can compare these two function with the statsmodels autocorrelation function acf, which shows that Joe's answer (with a minor modification shown in the code above) is using probably the correct normalization.

# Compare with statsmodels lags
lags = acf(data, nlags=N)

fig, axes = plt.subplots(nrows=2)
axes[0].plot(ac_joe - lags, label="joe - SM")
axes[0].set_ylim(-.5, .5)
axes[0].legend()
axes[1].plot(ac_unubtu - lags, c='r', label="unutbu - SM")
axes[1].set_ylim(-.5, .5)
axes[1].legend()
plt.show()

enter image description here

What is the reason for the different normalization values used in these two autocorrelation functions?

Gabriel
  • 40,504
  • 73
  • 230
  • 404
  • 1
    I don't think we can say one or the other is correct, statsmodels has a `unbiased` option to switch. AFAIK, it's a bit like degrees of freedom correction when computing standard errors, without it's the maximum likelihood estimate and with it's the unbiased estimate of the variance. However, I don't remember specifics for acf. – Josef Jun 18 '18 at 21:52
  • I checked with the `autocorr.function()` of the [emcee](http://dfm.io/emcee/current/) package and it coincides with `statsmodels` and Joe's answer, so I'm more inclined to think it is the correct form. I trust unubtu's answers very much though, so I'm not 100% sure. – Gabriel Jun 19 '18 at 00:06

0 Answers0