I'm testing some autocorrelation implementations found in SO. I came across two great answers by Joe Kington and unubtu. Both are very similar, except for the normalization used.
The former uses a max
value, and the latter the variance
. The results are quite different for some random uniform data as seen below.
import numpy as np
from statsmodels.tsa.stattools import acf
import matplotlib.pyplot as plt
def acorr_unutbu(x):
x = x - x.mean()
autocorr = np.correlate(x, x, mode='full')[-x.size:]
# Normalization
autocorr /= (x.var() * (np.arange(x.size, 0, -1)))
return autocorr
def acorr_joe(x):
x = x - x.mean()
# The original answer uses [x.size:], I changed it to [-x.size:] to match
# the size of the other function
autocorr = np.correlate(x, x, mode='full')[-x.size:]
# Normalization
autocorr /= autocorr.max()
return autocorr
N = 1000
data = np.random.rand(N)
ac_joe = acorr_joe(data)
ac_unubtu = acorr_unutbu(data)
fig, axes = plt.subplots(nrows=2)
axes[0].plot(ac_joe, label="joe")
axes[0].legend()
axes[1].plot(ac_unubtu, c='r', label="unutbu")
axes[1].legend()
plt.show()
I can compare these two function with the statsmodels autocorrelation function acf
, which shows that Joe's answer (with a minor modification shown in the code above) is using probably the correct normalization.
# Compare with statsmodels lags
lags = acf(data, nlags=N)
fig, axes = plt.subplots(nrows=2)
axes[0].plot(ac_joe - lags, label="joe - SM")
axes[0].set_ylim(-.5, .5)
axes[0].legend()
axes[1].plot(ac_unubtu - lags, c='r', label="unutbu - SM")
axes[1].set_ylim(-.5, .5)
axes[1].legend()
plt.show()
What is the reason for the different normalization values used in these two autocorrelation functions?