Faster rolling_apply on a Pandas DataFrame?

Question

Improving upon this question which provided a clever solution for applying a function over multiple columns in a DataFrame, I'm wondering if the solution can be further optimized for speed.

Environment: Python 2.7.8, Pandas 14.1, Numpy 1.8.

Here's the example setup:

import pandas as pd
import numpy as np
import random

def meanmax(ii,df):
    xdf = df.iloc[map(int,ii)]
    n = max(xdf['A']) + max(xdf['B'])
    return n / 2.0

df  = pd.DataFrame(np.random.randn(2500,2)/10000, 
                    index=pd.date_range('2001-01-01',periods=2500),
                    columns=['A','B'])              
df['ii'] = range(len(df))      

res = pd.rolling_apply(df.ii, 26, lambda x: meanmax(x, df))

Note that the meanmax function is not pairwise, thus something like rolling_mean(df['A'] + df['B'],26) won't work.

However I can do something like:

res2 = (pd.rolling_max(df['A'],26) + pd.rolling_max(df['B'],26)) / 2

Which completes roughly 3000x faster:

%timeit res = pd.rolling_apply(df.ii, 26, lambda x: meanmax(x, df))
1 loops, best of 3: 1 s per loop

%timeit res2 = (pd.rolling_max(df['A'],26) + pd.rolling_max(df['B'],26)) / 2
1000 loops, best of 3: 325 µs per loop

Is there anything better/equivalent than the second option above, given the example function and using rolling_apply? While the second option is faster, it doesn't use a rolling_apply, which can be applied to a wider problem set

Edit: Performance timing correction

Er.. that's 325 microseconds versus 1 second that is over 1000 times faster you realise — EdChum, Aug 30 '14 at 17:40
Indeed, I hadn't even bothered to notice the 'µ' in there... Thanks for the correction — bazel, Aug 30 '14 at 22:25

score 7 · Answer 1 · answered Aug 30 '14 at 17:39

Computing a generic rolling function over an array of size n with a window of size m requires roughly O(n*m) time. The built in rollin_xxx methods use some pretty smart algorithms to keep running time well below that, and can often guarantee O(n) time, which if you think of it is a pretty impressive thing.

rolling_min and rolling_max in particular borrowed their implementation from bottleneck, which cites Richard Harter as the source of the algorithm, although I found what I think is an earlier description of the same algorithm in this paper.

So after the history lesson: it is very likely that you will not be able to have your cake an eat it. rolling_apply is very convenient, but it is almost always going to sacrifice performance against a specific algorithm. In my experience, one of the more enjoyable parts of using the Python scientific stack is coming up with efficient ways of doing computations, using the fast primitives provided in creative ways. Your own solution calling rolling_max twice is a good example of this. So relax and enjoy the ride, knowing that you will always have rolling_apply to fall back on if you, or the good people of SO, cannot come with a smarter solution.

Thanks - I had completely forgotten about the `bottleneck` module - which does explain why the double rolling_max is so fast. I'm just wondering if the original strategy above could be made improved, say if perhaps `rolling_apply` could take something larger than a one dimensional ndarray. Then we wouldn't have to bother with doing an iloc within the `meanmax` function, nor the extra lambda call — bazel, Aug 30 '14 at 23:41

score 3 · Answer 2 · answered Aug 30 '14 at 23:41

You won't be able to get down to rolling_max speed, but you can often shave off an order of magnitude or so by dropping down to numpy via .values:

def meanmax_np(ii, df):
    ii = ii.astype(int)
    n = df["A"].values[ii].max() + df["B"].values[ii].max()
    return n/2.0

gives me

>>> %timeit res = pd.rolling_apply(df.ii, 26, lambda x: meanmax(x, df))
1 loops, best of 3: 701 ms per loop
>>> %timeit res_np = pd.rolling_apply(df.ii, 26, lambda x: meanmax_np(x, df))
10 loops, best of 3: 31.2 ms per loop
>>> %timeit res2 = (pd.rolling_max(df['A'],26) + pd.rolling_max(df['B'],26)) / 2
1000 loops, best of 3: 247 µs per loop

which though still 100x slower than the optimized case is much faster than the original. Sometimes when I only need something to be ten times faster for it not to be the dominant timesink that suffices.

Good catch. I'm still looking for some magic (that perhaps doesn't exist) which could be applied in general to problems that require `rolling_apply`, but this example is informative for numpy use regardless - thanks. — bazel, Aug 31 '14 at 13:40

Faster rolling_apply on a Pandas DataFrame?

2 Answers2

Linked