As Quang pointed out, chaining actually isn't needed here (except to chain sort_values
if you want to match the expected output from your original question):
Either use a boolean filter:
df[df.index >= df.groupby('ts_code')['high'].transform('idxmax')].sort_values('ts_code')
# ts_code low high
# 2021-08-22 881105.TI 1656.0 1804.0
# 2021-08-15 885452.TI 2352.0 2459.0
# 2021-08-22 885452.TI 2329.0 2415.0
# 2021-08-01 885525.TI 7427.0 8552.0
# 2021-08-08 885525.TI 7300.0 7868.0
# 2021-08-15 885525.TI 7525.0 8236.0
# 2021-08-22 885525.TI 7400.0 8270.0
# 2021-08-22 885641.TI 691.0 720.0
Or a loc
callable:
(df.sort_values('ts_code')
.loc[lambda x: x.index >= x.groupby('ts_code')['high'].transform('idxmax')])
Convert existing code with pipe
But if you want to convert the existing code into a chain, pipe
the previous operation's output as input for the next operation.
First assign
the maxdate
for each group and pipe
the result into a date filter. Then chain sort_values
to match the expected output from your original question:
(df.assign(maxdate=df.groupby('ts_code')['high'].transform('idxmax'))
.pipe(lambda x: x[x.index >= x.maxdate])
.drop(columns='maxdate')
.sort_values('ts_code'))
# ts_code low high
# 2021-08-22 881105.TI 1656.0 1804.0
# 2021-08-15 885452.TI 2352.0 2459.0
# 2021-08-22 885452.TI 2329.0 2415.0
# 2021-08-01 885525.TI 7427.0 8552.0
# 2021-08-08 885525.TI 7300.0 7868.0
# 2021-08-15 885525.TI 7525.0 8236.0
# 2021-08-22 885525.TI 7400.0 8270.0
# 2021-08-22 885641.TI 691.0 720.0
I'm not sure why BENY's answer included an extra cumcount
filter, but you can also pipe
that if you want (however I don't think this is what your original question asks for):
(df.assign(maxdate=df.groupby('ts_code')['high'].transform('idxmax'))
.pipe(lambda x: x[x.index >= x.maxdate])
.pipe(lambda x: x[x.groupby('ts_code').cumcount() <= 1]))
# ts_code low high maxdate
# 2021-08-01 885525.TI 7427.0 8552.0 2021-08-01
# 2021-08-08 885525.TI 7300.0 7868.0 2021-08-01
# 2021-08-15 885452.TI 2352.0 2459.0 2021-08-15
# 2021-08-22 881105.TI 1656.0 1804.0 2021-08-22
# 2021-08-22 885452.TI 2329.0 2415.0 2021-08-15
# 2021-08-22 885641.TI 691.0 720.0 2021-08-22
You can also use groupby.apply
to index all events later than group['high'].idxmax()
:
(df.groupby('ts_code')
.apply(lambda group: group[group.index >= group['high'].idxmax()])
.droplevel(0)
.sort_values('ts_code'))
# ts_code low high
# 2021-08-22 881105.TI 1656.0 1804.0
# 2021-08-15 885452.TI 2352.0 2459.0
# 2021-08-22 885452.TI 2329.0 2415.0
# 2021-08-01 885525.TI 7427.0 8552.0
# 2021-08-08 885525.TI 7300.0 7868.0
# 2021-08-15 885525.TI 7525.0 8236.0
# 2021-08-22 885525.TI 7400.0 8270.0
# 2021-08-22 885641.TI 691.0 720.0