1

I have a data frame with two columns j and k. I'd like to do a 3-element forward rolling calculation like this:

  1. Find the first element (in order) in k that's smaller than j if it exists

  2. Otherwise - if all elements in k are greater than j - use the last (in order) from k

The result would go into column l.

Example:

numpy.random.seed(1)

df = pandas.DataFrame(dict(
  j=numpy.random.randint(10, 99, 12),
  k=numpy.random.randint(10, 99, 12),
))

# Expected result
df['l'] = [16, 30, 60, 94, 28, 38, 21, 38, 24, 78, numpy.nan, numpy.nan]

print(df)

Dataframe with j and k which are the input columns and manually populated l with the expected result (with some comments):

     j   k     l
0   47  16  16.0 # Case 1, 47 > 16, so put 16
1   22  35  30.0 # Case 2, 22 < [35, 60, 30], so put 30
2   82  60  60.0 # Case 1, 82 > 60, so put 60
3   19  30  94.0 # Case 2, 19 < [30, 28, 94], so put 94
4   85  28  28.0 # Case 1, 85 > 28, so put 28
5   15  94  38.0 # Case 1, 15 < [94, 21, 38], so put 38
6   89  21  21.0 # Case 1, 89 > 21, so put 21
7   74  38  38.0 # Case 1, 74 < 38, so put 38
8   26  39  24.0 # Case 1, 26 < 39, but 26 > 24, so put 24
9   11  24  78.0 # Case 2, 11 < [24, 60, 78], so put 78
10  86  60   NaN # If easier, l = 60 is fine
11  81  78   NaN # If easier, l = 78 is fine

Any idea how I can do this using pandas without manually iterating (i.e. benefiting from vectorization)?

levant pied
  • 3,886
  • 5
  • 37
  • 56

1 Answers1

1

Here is a way using shift to align all values of the rolling windows on k with the value of j. Then keep only the values where they are less than the column j. Use bfill along the columns to get the first non null value found in the row back to the first column, that you can select with iloc[:, 0] and fillna the rows with no value using the last value in the rolling window of k.

roll_win = 3
df['l_calc'] = (
    pd.concat([df['k'].shift(-i) for i in range(roll_win)], axis=1)
      .where(lambda x: x.lt(df['j'], axis=0))
      .bfill(axis=1)
      .iloc[:,0]
      .fillna(df['k'].shift(-roll_win+1))
)
print(df)
#      j   k     l  l_calc
# 0   47  16  16.0    16.0
# 1   22  35  30.0    30.0
# 2   82  60  60.0    60.0
# 3   19  30  94.0    94.0
# 4   85  28  28.0    28.0
# 5   15  94  38.0    38.0
# 6   89  21  21.0    21.0
# 7   74  38  38.0    38.0
# 8   26  39  24.0    24.0
# 9   11  24  78.0    78.0
# 10  86  60   NaN    60.0
# 11  81  78   NaN    78.0

Note: technically, because of the fillna at the end, you don't need to care about the comparison with this column and in the concat, do only the range(win_roll-1), you would save some time if your data is large

Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • 1
    Nice solution, thanks! Curious, is a solution possible with forward-looking rolling windows? – levant pied Nov 01 '22 at 20:57
  • 1
    @levantpied so forward rolling is usually done by reversing the order of the dataframe and then do a regular backward rolling and then reverse the order. That said, rolling using several columns in the rolling function is not that easy, [see this QA for example](https://stackoverflow.com/questions/60736556/pandas-rolling-apply-using-multiple-columns) except if new version of pandas have this implemented. – Ben.T Nov 02 '22 at 03:11
  • 1
    @levantpied If your concern is about bigger window, hence a lot of shift, you can have a look at a [numpy version of rolling windows](https://numpy.org/doc/stable/reference/generated/numpy.lib.stride_tricks.sliding_window_view.html#numpy.lib.stride_tricks.sliding_window_view), I never used it maybe – Ben.T Nov 02 '22 at 03:14
  • Great, thanks Ben.T! I'll check these out. – levant pied Nov 02 '22 at 13:42