Rolling statistics performance: pandas vs. numpy strides

Question

I am interested in calculating statistics in rolling windows on large, 1D numpy arrays. For small window sizes, using numpy strides (a la numpy.lib.stride_tricks.sliding_window_view) is faster than pandas rolling window implementation, but the opposite is true for large window sizes.

Consider the following:

import numpy as np
from numpy.lib.stride_tricks import sliding_window_view
import pandas as pd

data = np.random.randn(10**6)
data_pandas = pd.Series(data)

window = 2
%timeit np.mean(sliding_window_view(data, window), axis=1)
# 19.3 ms ± 255 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit data_pandas.rolling(window).mean()
# 34.3 ms ± 688 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

window = 1000
%timeit np.mean(sliding_window_view(data, window), axis=1)
# 302 ms ± 8.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit data_pandas.rolling(window).mean()
# 31.7 ms ± 958 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

result_numpy = np.mean(sliding_window_view(data, window), axis=1)
result_pandas = data_pandas.rolling(window).mean()[window-1:]
np.allclose(result_numpy, result_pandas)
# True

The pandas implementation is actually faster for a larger window size, whereas the numpy implementation is much slower.

~~What is going on under the hood with pandas, and how can we get similar performance using numpy?~~

How can I get similar performance on large windows in numpy as compared to pandas?

This question is open-ended and doesn't lend itself to a posted answer. You might consider rephrasing the last statement in your question as, "How can I get similar performance on large windows in NumPy as compared to Pandas?", assuming there is an answer. Discussing internal implementation details is generally beyond the scope of Stackoverflow questions. — scooter me fecit, Dec 15 '21 at 19:08
With `sliding_window` creation time is basically the same, but the big time difference is in taking a mean of a `(999999, 2)` versus `(999001, 1000)`. That's reasonable. Looking at the docs for `pd.rolling(...).mean` I see an `engine` parameter, which may be `cython` or `numba`. So `pandas` is doing it's own specialized compiled.calculation. — hpaulj, Dec 15 '21 at 19:21

Jérôme Richard · Accepted Answer · 2021-12-15T19:55:16.070

TL;DR: The two versions use very different algorithms.

The sliding_window_view trick is good to solve the rolling average problem with a small window but this is not a clean way to do that nor an efficient way, especially with a big window. Indeed, Numpy compute a mean and note a rolling average and thus have no clear information that the user is cheating with stride to compute something else. The provided Numpy implementation runs in O(n * w) where n is the array size and w the window size. Pandas does have the information that a rolling average needs to be computed and so it uses a much more efficient algorithm. The Pandas algorithm runs in O(n) time. For more information about it please read this post.

Here is a much faster Numpy implementation:

cumsum = np.cumsum(data)
invSize = 1. / window
(cumsum[window-1:] - np.concatenate([[0], cumsum[:-window]])) * invSize

Here are the performance results on my machine:

Naive Numpy version:  193.2 ms
Pandas version:        33.1 ms
Fast Numpy version:     8.5 ms

Thank you for the faster numpy rolling mean implementation. I'm also interested in a rolling variance (or standard deviation), which is even slower using numpy strides, as expected, but still very fast using pandas. Any chance you know of a similar numpy implementation for rolling variance? — Trey W., Dec 15 '21 at 21:28
For a numerically-stable variance it is not easy as the mean is changing during the computation. This post may help you: https://stackoverflow.com/questions/5147378/rolling-variance-algorithm — Jérôme Richard, Dec 15 '21 at 23:01

Rolling statistics performance: pandas vs. numpy strides

1 Answers1

Linked