Autocorrelation of vector in numpy

Question

I am not sure if autocorrelation is the correct term to use, but I would like a fast method that for a numpy array, c, calculates an average in the form of an array a with entries

a[n] = <c[n+k]c[k]>, where k runs over the entire array and the average is taken over all these starting points (maybe my notation is a bit off, but I hope it makes sense).

As an example I would like it to return the following for the array c = [1,2,3,4]

a = [numpy.mean([1x1, 2x2, 3x3, 4x4]), numpy.mean([1x2, 2x3, 3x4]), np.mean([1x3, 2x4]), np.mean([1x4])]

Is there a way to calculate such an average using built-in python functions?

FBruzzesi · Answer 1 · 2021-05-26T18:40:35.673

As a first solution you can use list comprehension (i.e. a for loop):

import numpy as np

n = 4
a = np.arange(1, n+1, dtype=float)

r = np.array([np.mean(a[:n-i] * a[i:]) for i in range(n)])
r
array([7.5, 6.66666667, 5.5, 4.0])

Going further, I think the "trick" in here with a minor correction should work:

def autocorr(x):
    
    n = x.size
    result = np.correlate(x, x, mode='full')
   
    return result[result.size//2:]/np.arange(n, 0, -1)

autocorr(a)
array([7.5, 6.66666667, 5.5, 4.0])

Comparing time performances:

n = 10_000
a = np.arange(1, n+1, dtype=float)

%timeit np.array([np.mean(a[:n-i] * a[i:]) for i in range(n)])
104 ms ± 6.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit autocorr(a)
16.5 ms ± 709 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Hence the second approach is ~6 times faster on a array size of 10000.

Autocorrelation of vector in numpy

1 Answers1