OK, I've taken a crack at this - first I recreated the dataframe:
import pandas as pd
data={("03/03/2021","APPLE",732,754.95,723.4,729.85,),
("04/03/2021","APPLE",733.25,765.7,715.85,752.45,"act"),
("05/03/2021","APPLE",752.45,761,730.5,748.95,),
("08/03/2021","APPLE",762.7,767.8,744.2,748.4,),
("09/03/2021","APPLE",755.55,759.4,738.65,750.75,"act"),
("10/03/2021","APPLE",757.5,753.1,743,745.35,),
("12/03/2021","APPLE",743,752.1,723,728.15,),
("15/03/2021","APPLE",727.8,727.8,706.05,719.05,),
("03/03/2021","ORANGE",2406,2417.7,2375.8,2402.1,),
("04/03/2021","ORANGE",2380,2435,2350,2417.1,"act"),
("05/03/2021","ORANGE",2399,2423.9,2377.1,2387.1,),
("08/03/2021","ORANGE",2383,2413.5,2360.05,2382.7,),
("09/03/2021","ORANGE",2400,2444,2396.15,2422.7,),
("10/03/2021","ORANGE",2446,2446,2415.55,2431.95,"act"),
("12/03/2021","ORANGE",2442.8,2464.65,2397,2401.35,),
("15/03/2021","ORANGE",2402.55,2427.55,2343.05,2355,)}
df = pd.DataFrame(data,
columns=("date","symbol","open","high","low","close","mark")).
sort_values(by=["symbol", "date"]).fillna("").reset_index(drop=True)
I figure that what you want to do is a simple max
on group-by. The tricky part is manipulating your data so it conforms with what group-by expects. That is, a field on which to group.
def block_diff(series, trigger, start_stop=False):
toggle = False
rs = list()
for i,v in series.iteritems():
if v==trigger:
if start_stop and toggle:
rs.append(toggle)
toggle=not toggle
elif start_stop and not toggle:
toggle=not toggle
rs.append(toggle)
elif not start_stop:
toggle=not toggle
rs.append(toggle)
else:
rs.append(toggle)
return pd.Series(rs)
So the above function is defined - the idea here is that we want to block-out the regions that are going to feature in the group-by. This function accepts a series, some matching trigger value, and a start_stop flag to fine-tune behavior.
If I apply that to the dataframe, using the returned True/False
values as an index to populate a copy of the grouping variable and store the results in a new field called act_block
then I create a unique grouping field that also functions as a start-stop filter. At the same time, I also create an additional column called act_sequence
which we'll use later to identify the initial starting row for each sub-group.
df['act_block'] = df[block_diff(df['mark'], "act", True)]['symbol']
df['act_sequence'] = df.groupby("act_block").cumcount()
df
date symbol open high low close mark act_block act_sequence
0 03/03/2021 APPLE 732.00 754.95 723.40 729.85 NaN 0
1 04/03/2021 APPLE 733.25 765.70 715.85 752.45 act APPLE 0
2 05/03/2021 APPLE 752.45 761.00 730.50 748.95 APPLE 1
3 08/03/2021 APPLE 762.70 767.80 744.20 748.40 APPLE 2
4 09/03/2021 APPLE 755.55 759.40 738.65 750.75 act APPLE 3
5 10/03/2021 APPLE 757.50 753.10 743.00 745.35 NaN 1
6 12/03/2021 APPLE 743.00 752.10 723.00 728.15 NaN 2
7 15/03/2021 APPLE 727.80 727.80 706.05 719.05 NaN 3
8 03/03/2021 ORANGE 2406.00 2417.70 2375.80 2402.10 NaN 4
9 04/03/2021 ORANGE 2380.00 2435.00 2350.00 2417.10 act ORANGE 0
10 05/03/2021 ORANGE 2399.00 2423.90 2377.10 2387.10 ORANGE 1
11 08/03/2021 ORANGE 2383.00 2413.50 2360.05 2382.70 ORANGE 2
12 09/03/2021 ORANGE 2400.00 2444.00 2396.15 2422.70 ORANGE 3
13 10/03/2021 ORANGE 2446.00 2446.00 2415.55 2431.95 act ORANGE 4
14 12/03/2021 ORANGE 2442.80 2464.65 2397.00 2401.35 NaN 5
15 15/03/2021 ORANGE 2402.55 2427.55 2343.05 2355.00 NaN 6
Now we can do a simple groupby on act_block
, saving the results into a series called max_groups:
max_groups = df.groupby("act_block")["high"].max()
act_block
APPLE 767.8
ORANGE 2446.0
Name: high, dtype: float64
Take this series and merge it with the original dataframe - if we do this with a filter, the max_vals
object will inherit the original dataframe's index, allowing us to do a pd.concat
to selectively join the two objects together to produce the intended output.
max_vals = df.merge(max_groups, left_on=["act_block"], right_on="act_block",how="left")[(df['act_sequence']==0)].fillna("")['high_y']
max_vals.name="max_val"
new_df = pd.concat([df, max_vals], axis=1).fillna("")
new_df = new_df[["date","symbol","open","high","low","close","mark","max_val"]]
new_df
|
date |
symbol |
open |
high |
low |
close |
mark |
max_val |
0 |
03/03/2021 |
APPLE |
732.00 |
754.95 |
723.40 |
729.85 |
|
|
1 |
04/03/2021 |
APPLE |
733.25 |
765.70 |
715.85 |
752.45 |
act |
767.8 |
2 |
05/03/2021 |
APPLE |
752.45 |
761.00 |
730.50 |
748.95 |
|
|
3 |
08/03/2021 |
APPLE |
762.70 |
767.80 |
744.20 |
748.40 |
|
|
4 |
09/03/2021 |
APPLE |
755.55 |
759.40 |
738.65 |
750.75 |
act |
|
5 |
10/03/2021 |
APPLE |
757.50 |
753.10 |
743.00 |
745.35 |
|
|
6 |
12/03/2021 |
APPLE |
743.00 |
752.10 |
723.00 |
728.15 |
|
|
7 |
15/03/2021 |
APPLE |
727.80 |
727.80 |
706.05 |
719.05 |
|
|
8 |
03/03/2021 |
ORANGE |
2406.00 |
2417.70 |
2375.80 |
2402.10 |
|
|
9 |
04/03/2021 |
ORANGE |
2380.00 |
2435.00 |
2350.00 |
2417.10 |
act |
2446 |
10 |
05/03/2021 |
ORANGE |
2399.00 |
2423.90 |
2377.10 |
2387.10 |
|
|
11 |
08/03/2021 |
ORANGE |
2383.00 |
2413.50 |
2360.05 |
2382.70 |
|
|
12 |
09/03/2021 |
ORANGE |
2400.00 |
2444.00 |
2396.15 |
2422.70 |
|
|
13 |
10/03/2021 |
ORANGE |
2446.00 |
2446.00 |
2415.55 |
2431.95 |
act |
|
14 |
12/03/2021 |
ORANGE |
2442.80 |
2464.65 |
2397.00 |
2401.35 |
|
|
15 |
15/03/2021 |
ORANGE |
2402.55 |
2427.55 |
2343.05 |
2355.00 |
|
|