I am interested in calculating statistics in rolling windows on large, 1D numpy arrays. For small window sizes, using numpy strides (a la numpy.lib.stride_tricks.sliding_window_view
) is faster than pandas rolling window implementation, but the opposite is true for large window sizes.
Consider the following:
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view
import pandas as pd
data = np.random.randn(10**6)
data_pandas = pd.Series(data)
window = 2
%timeit np.mean(sliding_window_view(data, window), axis=1)
# 19.3 ms ± 255 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit data_pandas.rolling(window).mean()
# 34.3 ms ± 688 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
window = 1000
%timeit np.mean(sliding_window_view(data, window), axis=1)
# 302 ms ± 8.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit data_pandas.rolling(window).mean()
# 31.7 ms ± 958 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
result_numpy = np.mean(sliding_window_view(data, window), axis=1)
result_pandas = data_pandas.rolling(window).mean()[window-1:]
np.allclose(result_numpy, result_pandas)
# True
The pandas implementation is actually faster for a larger window size, whereas the numpy implementation is much slower.
What is going on under the hood with pandas, and how can we get similar performance using numpy?
How can I get similar performance on large windows in numpy as compared to pandas?