Numpy version of rolling maximum in pandas

Question

TL;DR: My question is about how can I improve my function to outperform the pandas own moving maximum function?

Background info:

So I am working with a lot of moving averages, moving maximum and moving minimum etc, and the only moving windows like features I have found so far are in pandas.rolling method. The thing is: the data I have are numpy arrays and the end result I want must also be in numpy arrays as well; as much as I want to simply convert it to pandas series and back to numpy array to do the job like this:

result2_max = pd.Series(data_array).rolling(window).max().to_numpy()

, it is way too unpythonic in that converting data types seems unnecessary and there could be ways doing the exact same thing purely in numpy implementation.

However, as unpythonic as it may seem, it is faster than any approaches I have come up with or seen online. I will give the little benchmarks here below:

import numpy as np
import pandas as pd

def numpy_rolling_max(data, window):

    data = data[::-1]
    data_strides = data.strides[0]

    movin_window = np.lib.stride_tricks.as_strided(data, 
                                                    shape=(data.shape[0] - window +1, window), 
                                                    strides = (data_strides ,data_strides)
                                                    )[::-1]
    max_window =np.amax(movin_window, axis = 1)#this line seems to be the bottleneck


    nan_array = np.full(window - 1, np.nan)
    return np.hstack((nan_array, max_window))


def pandas_rolling_max(data, window):
    return pd.Series(data).rolling(window).max().to_numpy()

length = 120000
window = 190
data = np.arange(length) + 0.5

result1_max = numpy_rolling_max(data, window)#21.9ms per loop
result2_max = pandas_rolling_max(data, window)#5.43ms per loop

result_comparision = np.allclose(result1_max, result2_max, equal_nan = True)

With arraysize = 120k, window = 190, the pandas rolling maximum is about 3 times faster than then numpy version. I have no clue where to proceed, as I have already vectorized my own function as much as I can, but it is still way slower than the pandas version and I don't really know why.

Thank you in advance

EDIT: I have found the bottleneck and it is this line:

max_window =np.amax(movin_window, axis = 1)

But seeing that it is already a vectorized function call, I still have no clue how to proceed.

@WeNYoBen I haven't and I don't know how convolve may help....Do you mind showing me how? — mathguy, May 20 '19 at 03:32
Check https://stackoverflow.com/questions/43288542/max-in-a-sliding-window-in-numpy-array — BENY, May 20 '19 at 03:42
@WeNYoBen checked it, unfortunately the pandas default version seems outperform the answers below that posts; also np.convolve doesn't play in any part of the solutions below :/ — mathguy, May 20 '19 at 04:00
Did you try out the Scipy version (from scipy.ndimage.filters import maximum_filter1d one) - https://stackoverflow.com/a/43288787/? — Divakar, May 20 '19 at 04:48
@Divakar I have been waiting for your reply, since I have seen many fantastic replies of yours among other posts. Yes, I tried, but the border/horizon/end points of the result is not the same as the pandas version. By that I mean the length of the output array, the number of np.nan etc. I also tried some alteration of it, but couldn't find an obvious pattern of the nan counts and stuff. — mathguy, May 20 '19 at 04:53

score 5 · Accepted Answer · answered May 20 '19 at 05:39

We can use 1D max filter from Scipy to replicate the same behavior as pandas one and still be a bit more efficient -

from scipy.ndimage.filters import maximum_filter1d

def max_filter1d_same(a, W, fillna=np.nan):
    out_dtype = np.full(0,fillna).dtype
    hW = (W-1)//2 # Half window size
    out = maximum_filter1d(a,size=W, origin=hW)
    if out.dtype is out_dtype:
        out[:W-1] = fillna
    else:
        out = np.concatenate((np.full(W-1,fillna), out[W-1:]))
    return out

Sample runs -

In [161]: np.random.seed(0)
     ...: a = np.random.randint(0,999,(20))
     ...: window = 3

In [162]: a
Out[162]: 
array([684, 559, 629, 192, 835, 763, 707, 359,   9, 723, 277, 754, 804,
       599,  70, 472, 600, 396, 314, 705])

In [163]: pd.Series(a).rolling(window).max().to_numpy()
Out[163]: 
array([ nan,  nan, 684., 629., 835., 835., 835., 763., 707., 723., 723.,
       754., 804., 804., 804., 599., 600., 600., 600., 705.])

In [164]: max_filter1d_same(a,window)
Out[164]: 
array([ nan,  nan, 684., 629., 835., 835., 835., 763., 707., 723., 723.,
       754., 804., 804., 804., 599., 600., 600., 600., 705.])

# Use same dtype fillna for better memory efficiency
In [165]: max_filter1d_same(a,window,fillna=0)
Out[165]: 
array([  0,   0, 684, 629, 835, 835, 835, 763, 707, 723, 723, 754, 804,
       804, 804, 599, 600, 600, 600, 705])

Timings on actual test-cases sizes -

In [171]: # Actual test-cases sizes
     ...: np.random.seed(0)
     ...: data_array = np.random.randint(0,999,(120000))
     ...: window = 190

In [172]: %timeit pd.Series(data_array).rolling(window).max().to_numpy()
100 loops, best of 3: 4.43 ms per loop

In [173]: %timeit max_filter1d_same(data_array,window)
100 loops, best of 3: 1.95 ms per loop

1 upvote is not enough! I just tested the answer with a loop over window sizes from 1 to 1000 with 2 different data arrays and the results are matched, with a 200+% speed up. Thank you sir — mathguy, May 20 '19 at 05:57
I also have one more question though. What is the 'secret' underneath the max/min filter1d in scipy that makes computing max/min over a moving window so fast? Can such 'secret' apply to any functions, not just max and min, to a moving window in general? I am still new to the numpy stuff and like to learn more if possible. — mathguy, May 20 '19 at 06:20
@mathguy Think it's implemented in C under the hoods and is customized to use array data, so it's supposed to be efficient that way upon leveraging the spatial locality of array data. — Divakar, May 20 '19 at 06:25
Is there any way to leverage the spatial locality of array data using cython or any equivalent, if I were to write another function that will be applied to a moving window? If there is, then it will be a huge step up for any pandas rolling calculation stuff — mathguy, May 20 '19 at 06:30
@mathguy Cython should definitely help and AFAIK would make use of the spatial locality. So, do give it a try. — Divakar, May 20 '19 at 07:46

Numpy version of rolling maximum in pandas

1 Answers1

Linked