What is the difference between the Autocorrelation functions provided by statsmodels, scipy & numpy?

Question

I can see that there are different functions available across various libraries for performing Autocorrelation on a signal in Python.

I've tried the following 3 functions and all result in different outputs for the sample 'x' used, where, x = [22, 24, 25, 25, 28, 29, 34, 37, 40, 44, 51, 48, 47, 50, 51]

1) Using statsmodels

import statsmodels

res = statsmodels.tsa.stattools.acf(x)
plt.plot(res)
plt.show()

2. Using scipy

import scipy.signal as signal

res = signal.correlate(x, x, mode = 'same')
res_au = (res-min(res))/(max(res)-min(res))
plt.plot(res_au)
plt.show()

3. Using numpy

import numpy
res = numpy.correlate(x, x, mode='same')
res_norm = (res-min(res))/(max(res)-min(res))
plt.plot(res_norm)
plt.show()

Can anyone please explain what are the differences between them and when should we be using each of them?

My objective is to find autocorrelation for a single channel with itself.

Always Right Never Left · Answer 1 · 2022-03-17T00:35:15.967

Your confusion stems from the difference between statistical (statsmodels.tsa.stattools.acf) and signal processing (scipy.signal.correlate/numpy.correlate) definitions of autocorrelation. Statistical autocorrelation is normalized onto [-1,1] interval. Your attempt at normalization is incorrect.

Example using numpy.correlate to match output of statsmodels.tsa.stattools.acf:

import numpy as np
import matplotlib.pyplot as plt

x = np.array([22, 24, 25, 25, 28, 29, 34, 37, 40, 44, 51, 48, 47, 50, 51])

def acorr(x, lags):
    x_demeaned=x-x.mean()
    corr=np.correlate(x_demeaned,x_demeaned,'full')[len(x)-1:]/(np.var(x)*len(x))

    return corr[:len(lags)]
    

plt.plot(acorr(x, range(len(x))))
plt.show()

Related question: How can I use numpy.correlate to do autocorrelation?

What is the difference between the Autocorrelation functions provided by statsmodels, scipy & numpy?

1 Answers1