These are not a solution, at most workarounds for simple cases like the example function. But it confirms the suspicion that the processing speed of df.rolling.apply
is anything but optimal.
Using a much smaller dataset for obvious reasons
import pandas as pd
import numpy as np
df = pd.DataFrame(
np.random.rand(200,100)
)
period = 10
res = [0,0]
Running time with pandas
v1.3.5
%%timeit -n1 -r1
dd=lambda x: np.nanmax(1.0 - x / np.fmax.accumulate(x))
res[0] = df.rolling(window=period, min_periods=1).apply(dd)
# 1 loop, best of 1: 8.72 s per loop
Against a numpy
implementation
from numpy.lib.stride_tricks import sliding_window_view as window
%%timeit
x = window(np.vstack([np.full((period-1,df.shape[1]), np.nan),df.to_numpy()]), period, axis=0)
res[1] = np.nanmax(1.0 - x / np.fmax.accumulate(x, axis=-1), axis=-1)
# 100 loops, best of 5: 3.39 ms per loop
np.testing.assert_allclose(res[0], res[1])
8.72*1000 / 3.39 = 2572.27
x speedup.
Processing columns in chunks
l = []
for arr in np.array_split(df.to_numpy(), 100, 1):
x = window(np.vstack([np.full((period-1,arr.shape[1]), np.nan),arr]), period, axis=0)
l.append(np.nanmax(1.0 - x / np.fmax.accumulate(x, axis=-1), axis=-1))
res[1] = np.hstack(l)
# 1 loop, best of 5: 9.15 s per loop for df.shape (2000,2000)
Using pandas
numba
engine
We can get even faster with pandas
support for numba
jitted functions. Unfortunately numba v0.55.1
can't compile ufunc.accumulate
. We have to write our own implementation of np.fmax.accumulate
(no guarantees on my implementation). Please note that the first call is slower because the function needs to be compiled.
def dd_numba(x):
res = np.empty_like(x)
res[0] = x[0]
for i in range(1, len(res)):
if res[i-1] > x[i] or np.isnan(x[i]):
res[i] = res[i-1]
else:
res[i] = x[i]
return np.nanmax(1.0 - x / res)
df.rolling(window=period, min_periods=1).apply(dd_numba, engine='numba', raw=True)
We can use the familiar pandas interface and it's ~1.16x faster than my chunked numpy
approach for df.shape
(2000,2000).