I wanted to practically calculate a basic dataframe.column.rolling(window).max()
but the window is another column of arbitrary integers derived at an earlier stage.
However: methods similar to those found here or here or here appear to be extremely slow to the point of being unusable when the dataframe is large.
I suspect it's because SIMD hardware may prefer a constant nature of window sizes but I wonder if there is a way I miss.
Example data (as found in the first method linked above):
import pandas as pd
import numpy as np
np.random.seed([3,14])
a = np.random.randn(20).cumsum()
w = np.minimum(
np.random.randint(1, 4, size=a.shape),
np.arange(len(a))+1
)
df = pd.DataFrame({'Data': a, 'Window': w})
df
Data Window
0 -0.602923 1
1 -1.005579 2
2 -0.703250 3
3 -1.227599 1
4 -0.683756 1
5 -0.670621 2
6 -0.997120 1
7 0.387956 3
8 0.255502 1
9 -0.152361 2
10 1.150534 3
11 0.546298 3
12 0.302936 3
13 0.091674 1
14 -1.964947 1
15 -1.447079 2
16 -1.487828 1
17 -2.539703 1
18 -1.932612 3
19 -4.163049 2
Expected result of an equivalent rolling window max:
0 -0.602923
1 -0.602923
2 -0.602923
3 -1.227599
4 -0.683756
5 -0.670621
6 -0.997120
7 0.387956
8 0.255502
9 0.255502
10 1.150534
11 1.150534
12 1.150534
13 0.091674
14 -1.964947
15 -1.447079
16 -1.487828
17 -2.539703
18 -1.487828
19 -1.932612