I have a data frame with two columns j
and k
. I'd like to do a 3-element forward rolling calculation like this:
Find the first element (in order) in
k
that's smaller thanj
if it existsOtherwise - if all elements in
k
are greater thanj
- use the last (in order) fromk
The result would go into column l
.
Example:
numpy.random.seed(1)
df = pandas.DataFrame(dict(
j=numpy.random.randint(10, 99, 12),
k=numpy.random.randint(10, 99, 12),
))
# Expected result
df['l'] = [16, 30, 60, 94, 28, 38, 21, 38, 24, 78, numpy.nan, numpy.nan]
print(df)
Dataframe with j
and k
which are the input columns and manually populated l
with the expected result (with some comments):
j k l
0 47 16 16.0 # Case 1, 47 > 16, so put 16
1 22 35 30.0 # Case 2, 22 < [35, 60, 30], so put 30
2 82 60 60.0 # Case 1, 82 > 60, so put 60
3 19 30 94.0 # Case 2, 19 < [30, 28, 94], so put 94
4 85 28 28.0 # Case 1, 85 > 28, so put 28
5 15 94 38.0 # Case 1, 15 < [94, 21, 38], so put 38
6 89 21 21.0 # Case 1, 89 > 21, so put 21
7 74 38 38.0 # Case 1, 74 < 38, so put 38
8 26 39 24.0 # Case 1, 26 < 39, but 26 > 24, so put 24
9 11 24 78.0 # Case 2, 11 < [24, 60, 78], so put 78
10 86 60 NaN # If easier, l = 60 is fine
11 81 78 NaN # If easier, l = 78 is fine
Any idea how I can do this using pandas without manually iterating (i.e. benefiting from vectorization)?