I am writing numpy code to calculate autocorrelation. I am trying to improve the performance of my implementation.
I have tried two approached: matrix multiplication on an array views and dot product on array slices in a for loop. To my surprise, the first approach seems much slower.
The function takes a vector x
and maximum shift k
, and returns the dot product of the vector with a shifted vector for every shift i
.
def acorr_aview(x, k):
return np.dot([x[i:-k+i] for i in range(k)], x[:-k])
def acorr_loop(x, k):
return np.array([np.dot(x[i:-k+i],x[:-k]) for i in range(k)])
I was expecting acorr_aview
to have better performance due to utilizing matrix multiplication, but the opposite seems the be the case.
x = np.random.randn(10000)
k = 100
%timeit acorr_aview(x,k)
%timeit acorr_loop(x,k)
3.32 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
753 µs ± 33.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Why is acorr_loop
much faster? Thanks.
Edit: For comparison:
A = np.random.randn(9900,100)
v = np.random.randn(100)
%timeit np.dot(A,v)
%timeit np.array([np.dot(a,v) for a in A])
1.08 ms ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
12.4 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)