I currently have this function:
def process_data(data):
data = data[data['Bucket Number'] == 25.0].groupby(['Activity Month', 'Agent Sign']).agg({'Total Ping Current Forecast': [np.sum]})
data = data.sort_values(['Activity Month', ('Total Ping Current Forecast', 'sum')], ascending=[True, False]).groupby(level=0).head(3)
return data
Which produces this output:
Total Ping Current Forecast
sum
Activity Month Agent Sign
202001 {Various} 1.305513e+09
HDQGR1 2.171435e+08
CRCTLD 4.774614e+07
202002 {Various} 1.159181e+09
HDQGR1 1.912536e+08
CRCTLD 4.573402e+07
202003 {Various} 1.090292e+09
HDQGR1 1.852591e+08
CRCTLD 4.045673e+07
I want to remove the first row of each group so that the output looks like this:
Total Ping Current Forecast
sum
Activity Month Agent Sign
202001 HDQGR1 2.171435e+08
CRCTLD 4.774614e+07
DFW1DF 1.622023e+07
202002 HDQGR1 1.912536e+08
CRCTLD 4.573402e+07
HDQ1ZB 2.711470e+07
202003 HDQGR1 1.852591e+08
CRCTLD 4.045673e+07
HDQ1ZB 1.532134e+07
Essentially, I want the highest value of each group dropped since the dataframe is already sorted in descending order by sum
.
I found this solution and tried this:
def process_data(data):
data = data[data['Bucket Number'] == 25.0].groupby(['Activity Month', 'Agent Sign']).agg({'Total Ping Current Forecast': [np.sum]})
data = data.sort_values(['Activity Month', ('Total Ping Current Forecast', 'sum')], ascending=[True, False]).apply(lambda x: x.iloc[1:]).groupby(level=0).head(3)
return data
But it only applied the function to the first group giving this result:
Total Ping Current Forecast
sum
Activity Month Agent Sign
202001 HDQGR1 2.171435e+08
CRCTLD 4.774614e+07
DFW1DF 1.622023e+07
202002 {Various} 1.159181e+09
HDQGR1 1.912536e+08
CRCTLD 4.573402e+07
202003 {Various} 1.090292e+09
HDQGR1 1.852591e+08
CRCTLD 4.045673e+07
How do I apply that function to each group in the dataframe?