How to using chain method to select data after date which is the index of the max value of columns for each group by pandas?

Question

Goal

using pandas chain method to get the output.

df1 = df.reset_index()
df1 = df[df.index >= df.groupby('ts_code')['high'].transform('idxmax')]
out = df1[df1.groupby('ts_code').cumcount()<=1]

out
              ts_code     low    high
2021-08-01  885525.TI  7427.0  8552.0
2021-08-08  885525.TI  7300.0  7868.0
2021-08-15  885452.TI  2352.0  2459.0
2021-08-22  881105.TI  1656.0  1804.0
2021-08-22  885452.TI  2329.0  2415.0
2021-08-22  885641.TI   691.0   720.0

What's wrong with just `df[df.index >= df.groupby('ts_code')['high'].transform('idxmax')]`? — Quang Hoang, Oct 04 '21 at 21:35
Or with chain methods: `df.some_other_op().loc[lambda x: x.index >= x.groupby(...)['high'].transform('idxmax')]`. — Quang Hoang, Oct 04 '21 at 21:36

tdy · Accepted Answer · 2021-10-04T22:02:03.640

As Quang pointed out, chaining actually isn't needed here (except to chain sort_values if you want to match the expected output from your original question):

Either use a boolean filter:

df[df.index >= df.groupby('ts_code')['high'].transform('idxmax')].sort_values('ts_code')

#               ts_code     low    high
# 2021-08-22  881105.TI  1656.0  1804.0
# 2021-08-15  885452.TI  2352.0  2459.0
# 2021-08-22  885452.TI  2329.0  2415.0
# 2021-08-01  885525.TI  7427.0  8552.0
# 2021-08-08  885525.TI  7300.0  7868.0
# 2021-08-15  885525.TI  7525.0  8236.0
# 2021-08-22  885525.TI  7400.0  8270.0
# 2021-08-22  885641.TI   691.0   720.0

Or a loc callable:

(df.sort_values('ts_code')
   .loc[lambda x: x.index >= x.groupby('ts_code')['high'].transform('idxmax')])

Convert existing code with `pipe`

But if you want to convert the existing code into a chain, pipe the previous operation's output as input for the next operation.

First assign the maxdate for each group and pipe the result into a date filter. Then chain sort_values to match the expected output from your original question:

(df.assign(maxdate=df.groupby('ts_code')['high'].transform('idxmax'))
   .pipe(lambda x: x[x.index >= x.maxdate])
   .drop(columns='maxdate')
   .sort_values('ts_code'))

#               ts_code     low    high
# 2021-08-22  881105.TI  1656.0  1804.0
# 2021-08-15  885452.TI  2352.0  2459.0
# 2021-08-22  885452.TI  2329.0  2415.0
# 2021-08-01  885525.TI  7427.0  8552.0
# 2021-08-08  885525.TI  7300.0  7868.0
# 2021-08-15  885525.TI  7525.0  8236.0
# 2021-08-22  885525.TI  7400.0  8270.0
# 2021-08-22  885641.TI   691.0   720.0

I'm not sure why BENY's answer included an extra cumcount filter, but you can also pipe that if you want (however I don't think this is what your original question asks for):

(df.assign(maxdate=df.groupby('ts_code')['high'].transform('idxmax'))
   .pipe(lambda x: x[x.index >= x.maxdate])
   .pipe(lambda x: x[x.groupby('ts_code').cumcount() <= 1]))

#               ts_code     low    high    maxdate
# 2021-08-01  885525.TI  7427.0  8552.0 2021-08-01
# 2021-08-08  885525.TI  7300.0  7868.0 2021-08-01
# 2021-08-15  885452.TI  2352.0  2459.0 2021-08-15
# 2021-08-22  881105.TI  1656.0  1804.0 2021-08-22
# 2021-08-22  885452.TI  2329.0  2415.0 2021-08-15
# 2021-08-22  885641.TI   691.0   720.0 2021-08-22

`groupby.apply` alternative

You can also use groupby.apply to index all events later than group['high'].idxmax():

(df.groupby('ts_code')
   .apply(lambda group: group[group.index >= group['high'].idxmax()])
   .droplevel(0)
   .sort_values('ts_code'))

#               ts_code     low    high
# 2021-08-22  881105.TI  1656.0  1804.0
# 2021-08-15  885452.TI  2352.0  2459.0
# 2021-08-22  885452.TI  2329.0  2415.0
# 2021-08-01  885525.TI  7427.0  8552.0
# 2021-08-08  885525.TI  7300.0  7868.0
# 2021-08-15  885525.TI  7525.0  8236.0
# 2021-08-22  885525.TI  7400.0  8270.0
# 2021-08-22  885641.TI   691.0   720.0

How to using chain method to select data after date which is the index of the max value of columns for each group by pandas?

1 Answers1

Convert existing code with pipe

groupby.apply alternative

Convert existing code with `pipe`

`groupby.apply` alternative