Issue
I'm trying to create using Pandas a new column populated with the next high value of each id
.
Below input data and the data I expect to get.
I explored where/groupby/transform as describe here: Pandas: find maximum value, when and if conditions. But it can't fit my use case because I need a where
statement depending on each row, not a constant value.
n.b. It is equivalent to Excel formula MAXIFS: MAXIFS(C:C;A:A;A2;B:B;">"&B2)
where: A = id; B = date; C = value
Input data
df = pd.DataFrame({
"id": ["a"] * 2 + ["b"] * 4 + ["a", "b"] * 2 + ["b"],
"date": pd.date_range(datetime(2023, 1, 1), periods=11).tolist(),
"value": [3, 10, 2, 20, 24, 9, 21, 7, 25, 12, 7]
})
# id date value
# 0 a 2023-01-01 3
# 1 a 2023-01-02 10
# 2 b 2023-01-03 2
# 3 b 2023-01-04 20
# 4 b 2023-01-05 24
# 5 b 2023-01-06 9
# 6 a 2023-01-07 21
# 7 b 2023-01-08 7
# 8 a 2023-01-09 25
# 9 b 2023-01-10 12
# 10 b 2023-01-11 7
Expected output
df_expected = pd.concat(
[df, pd.DataFrame({
"next_local_max": [25, 25, 24, 24, 12, 12, 25, 12, np.nan, 7, np.nan]
})],
axis=1
)
# id date value next_local_max
# 0 a 2023-01-01 3 25.0
# 1 a 2023-01-02 10 25.0
# 2 b 2023-01-03 2 24.0
# 3 b 2023-01-04 20 24.0
# 4 b 2023-01-05 24 12.0
# 5 b 2023-01-06 9 12.0
# 6 a 2023-01-07 21 25.0
# 7 b 2023-01-08 7 12.0
# 8 a 2023-01-09 25 NaN
# 9 b 2023-01-10 12 7.0
# 10 b 2023-01-11 7 NaN
Edit
This question is well answer. I tried to continue and generalize this issue when more complex max conditions are needed: Perform Excel MAXIFS in Pandas with multiple conditions