2

TL;DR: My question is about how can I improve my function to outperform the pandas own moving maximum function?


Background info:

So I am working with a lot of moving averages, moving maximum and moving minimum etc, and the only moving windows like features I have found so far are in pandas.rolling method. The thing is: the data I have are numpy arrays and the end result I want must also be in numpy arrays as well; as much as I want to simply convert it to pandas series and back to numpy array to do the job like this:

result2_max = pd.Series(data_array).rolling(window).max().to_numpy()

, it is way too unpythonic in that converting data types seems unnecessary and there could be ways doing the exact same thing purely in numpy implementation.

However, as unpythonic as it may seem, it is faster than any approaches I have come up with or seen online. I will give the little benchmarks here below:

import numpy as np
import pandas as pd

def numpy_rolling_max(data, window):

    data = data[::-1]
    data_strides = data.strides[0]

    movin_window = np.lib.stride_tricks.as_strided(data, 
                                                    shape=(data.shape[0] - window +1, window), 
                                                    strides = (data_strides ,data_strides)
                                                    )[::-1]
    max_window =np.amax(movin_window, axis = 1)#this line seems to be the bottleneck


    nan_array = np.full(window - 1, np.nan)
    return np.hstack((nan_array, max_window))


def pandas_rolling_max(data, window):
    return pd.Series(data).rolling(window).max().to_numpy()

length = 120000
window = 190
data = np.arange(length) + 0.5

result1_max = numpy_rolling_max(data, window)#21.9ms per loop
result2_max = pandas_rolling_max(data, window)#5.43ms per loop

result_comparision = np.allclose(result1_max, result2_max, equal_nan = True)

With arraysize = 120k, window = 190, the pandas rolling maximum is about 3 times faster than then numpy version. I have no clue where to proceed, as I have already vectorized my own function as much as I can, but it is still way slower than the pandas version and I don't really know why.

Thank you in advance

EDIT: I have found the bottleneck and it is this line:

max_window =np.amax(movin_window, axis = 1)

But seeing that it is already a vectorized function call, I still have no clue how to proceed.

mathguy
  • 1,450
  • 1
  • 16
  • 33
  • Did you try `convolve` in numpy – BENY May 20 '19 at 03:30
  • @WeNYoBen I haven't and I don't know how convolve may help....Do you mind showing me how? – mathguy May 20 '19 at 03:32
  • Check https://stackoverflow.com/questions/43288542/max-in-a-sliding-window-in-numpy-array – BENY May 20 '19 at 03:42
  • @WeNYoBen checked it, unfortunately the pandas default version seems outperform the answers below that posts; also np.convolve doesn't play in any part of the solutions below :/ – mathguy May 20 '19 at 04:00
  • Did you try out the Scipy version (from scipy.ndimage.filters import maximum_filter1d one) - https://stackoverflow.com/a/43288787/? – Divakar May 20 '19 at 04:48
  • @Divakar I have been waiting for your reply, since I have seen many fantastic replies of yours among other posts. Yes, I tried, but the border/horizon/end points of the result is not the same as the pandas version. By that I mean the length of the output array, the number of np.nan etc. I also tried some alteration of it, but couldn't find an obvious pattern of the nan counts and stuff. – mathguy May 20 '19 at 04:53

1 Answers1

5

We can use 1D max filter from Scipy to replicate the same behavior as pandas one and still be a bit more efficient -

from scipy.ndimage.filters import maximum_filter1d

def max_filter1d_same(a, W, fillna=np.nan):
    out_dtype = np.full(0,fillna).dtype
    hW = (W-1)//2 # Half window size
    out = maximum_filter1d(a,size=W, origin=hW)
    if out.dtype is out_dtype:
        out[:W-1] = fillna
    else:
        out = np.concatenate((np.full(W-1,fillna), out[W-1:]))
    return out

Sample runs -

In [161]: np.random.seed(0)
     ...: a = np.random.randint(0,999,(20))
     ...: window = 3

In [162]: a
Out[162]: 
array([684, 559, 629, 192, 835, 763, 707, 359,   9, 723, 277, 754, 804,
       599,  70, 472, 600, 396, 314, 705])

In [163]: pd.Series(a).rolling(window).max().to_numpy()
Out[163]: 
array([ nan,  nan, 684., 629., 835., 835., 835., 763., 707., 723., 723.,
       754., 804., 804., 804., 599., 600., 600., 600., 705.])

In [164]: max_filter1d_same(a,window)
Out[164]: 
array([ nan,  nan, 684., 629., 835., 835., 835., 763., 707., 723., 723.,
       754., 804., 804., 804., 599., 600., 600., 600., 705.])

# Use same dtype fillna for better memory efficiency
In [165]: max_filter1d_same(a,window,fillna=0)
Out[165]: 
array([  0,   0, 684, 629, 835, 835, 835, 763, 707, 723, 723, 754, 804,
       804, 804, 599, 600, 600, 600, 705])

Timings on actual test-cases sizes -

In [171]: # Actual test-cases sizes
     ...: np.random.seed(0)
     ...: data_array = np.random.randint(0,999,(120000))
     ...: window = 190

In [172]: %timeit pd.Series(data_array).rolling(window).max().to_numpy()
100 loops, best of 3: 4.43 ms per loop

In [173]: %timeit max_filter1d_same(data_array,window)
100 loops, best of 3: 1.95 ms per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 1 upvote is not enough! I just tested the answer with a loop over window sizes from 1 to 1000 with 2 different data arrays and the results are matched, with a 200+% speed up. Thank you sir – mathguy May 20 '19 at 05:57
  • I also have one more question though. What is the 'secret' underneath the max/min filter1d in scipy that makes computing max/min over a moving window so fast? Can such 'secret' apply to any functions, not just max and min, to a moving window in general? I am still new to the numpy stuff and like to learn more if possible. – mathguy May 20 '19 at 06:20
  • @mathguy Think it's implemented in C under the hoods and is customized to use array data, so it's supposed to be efficient that way upon leveraging the spatial locality of array data. – Divakar May 20 '19 at 06:25
  • Is there any way to leverage the spatial locality of array data using cython or any equivalent, if I were to write another function that will be applied to a moving window? If there is, then it will be a huge step up for any pandas rolling calculation stuff – mathguy May 20 '19 at 06:30
  • @mathguy Cython should definitely help and AFAIK would make use of the spatial locality. So, do give it a try. – Divakar May 20 '19 at 07:46