-1

Problem: this post

Goal

using pandas chain method to get the output.

df1 = df.reset_index()
df1 = df[df.index >= df.groupby('ts_code')['high'].transform('idxmax')]
out = df1[df1.groupby('ts_code').cumcount()<=1]

out
              ts_code     low    high
2021-08-01  885525.TI  7427.0  8552.0
2021-08-08  885525.TI  7300.0  7868.0
2021-08-15  885452.TI  2352.0  2459.0
2021-08-22  881105.TI  1656.0  1804.0
2021-08-22  885452.TI  2329.0  2415.0
2021-08-22  885641.TI   691.0   720.0
Jack
  • 1,724
  • 4
  • 18
  • 33

1 Answers1

2

As Quang pointed out, chaining actually isn't needed here (except to chain sort_values if you want to match the expected output from your original question):

  • Either use a boolean filter:

    df[df.index >= df.groupby('ts_code')['high'].transform('idxmax')].sort_values('ts_code')
    
    #               ts_code     low    high
    # 2021-08-22  881105.TI  1656.0  1804.0
    # 2021-08-15  885452.TI  2352.0  2459.0
    # 2021-08-22  885452.TI  2329.0  2415.0
    # 2021-08-01  885525.TI  7427.0  8552.0
    # 2021-08-08  885525.TI  7300.0  7868.0
    # 2021-08-15  885525.TI  7525.0  8236.0
    # 2021-08-22  885525.TI  7400.0  8270.0
    # 2021-08-22  885641.TI   691.0   720.0
    
  • Or a loc callable:

    (df.sort_values('ts_code')
       .loc[lambda x: x.index >= x.groupby('ts_code')['high'].transform('idxmax')])
    

Convert existing code with pipe

But if you want to convert the existing code into a chain, pipe the previous operation's output as input for the next operation.

First assign the maxdate for each group and pipe the result into a date filter. Then chain sort_values to match the expected output from your original question:

(df.assign(maxdate=df.groupby('ts_code')['high'].transform('idxmax'))
   .pipe(lambda x: x[x.index >= x.maxdate])
   .drop(columns='maxdate')
   .sort_values('ts_code'))

#               ts_code     low    high
# 2021-08-22  881105.TI  1656.0  1804.0
# 2021-08-15  885452.TI  2352.0  2459.0
# 2021-08-22  885452.TI  2329.0  2415.0
# 2021-08-01  885525.TI  7427.0  8552.0
# 2021-08-08  885525.TI  7300.0  7868.0
# 2021-08-15  885525.TI  7525.0  8236.0
# 2021-08-22  885525.TI  7400.0  8270.0
# 2021-08-22  885641.TI   691.0   720.0

I'm not sure why BENY's answer included an extra cumcount filter, but you can also pipe that if you want (however I don't think this is what your original question asks for):

(df.assign(maxdate=df.groupby('ts_code')['high'].transform('idxmax'))
   .pipe(lambda x: x[x.index >= x.maxdate])
   .pipe(lambda x: x[x.groupby('ts_code').cumcount() <= 1]))

#               ts_code     low    high    maxdate
# 2021-08-01  885525.TI  7427.0  8552.0 2021-08-01
# 2021-08-08  885525.TI  7300.0  7868.0 2021-08-01
# 2021-08-15  885452.TI  2352.0  2459.0 2021-08-15
# 2021-08-22  881105.TI  1656.0  1804.0 2021-08-22
# 2021-08-22  885452.TI  2329.0  2415.0 2021-08-15
# 2021-08-22  885641.TI   691.0   720.0 2021-08-22

groupby.apply alternative

You can also use groupby.apply to index all events later than group['high'].idxmax():

(df.groupby('ts_code')
   .apply(lambda group: group[group.index >= group['high'].idxmax()])
   .droplevel(0)
   .sort_values('ts_code'))

#               ts_code     low    high
# 2021-08-22  881105.TI  1656.0  1804.0
# 2021-08-15  885452.TI  2352.0  2459.0
# 2021-08-22  885452.TI  2329.0  2415.0
# 2021-08-01  885525.TI  7427.0  8552.0
# 2021-08-08  885525.TI  7300.0  7868.0
# 2021-08-15  885525.TI  7525.0  8236.0
# 2021-08-22  885525.TI  7400.0  8270.0
# 2021-08-22  885641.TI   691.0   720.0
tdy
  • 36,675
  • 19
  • 86
  • 83