0

I currently have this function:

def process_data(data):
    data = data[data['Bucket Number'] == 25.0].groupby(['Activity Month', 'Agent Sign']).agg({'Total Ping Current Forecast': [np.sum]})
    data = data.sort_values(['Activity Month', ('Total Ping Current Forecast', 'sum')], ascending=[True, False]).groupby(level=0).head(3)
    return data

Which produces this output:

                          Total Ping Current Forecast
                                                  sum
Activity Month Agent Sign                            
202001         {Various}                 1.305513e+09
               HDQGR1                    2.171435e+08
               CRCTLD                    4.774614e+07
202002         {Various}                 1.159181e+09
               HDQGR1                    1.912536e+08
               CRCTLD                    4.573402e+07
202003         {Various}                 1.090292e+09
               HDQGR1                    1.852591e+08
               CRCTLD                    4.045673e+07

I want to remove the first row of each group so that the output looks like this:

                          Total Ping Current Forecast
                                                  sum
Activity Month Agent Sign                            
202001         HDQGR1                    2.171435e+08
               CRCTLD                    4.774614e+07
               DFW1DF                    1.622023e+07
202002         HDQGR1                    1.912536e+08
               CRCTLD                    4.573402e+07
               HDQ1ZB                    2.711470e+07
202003         HDQGR1                    1.852591e+08
               CRCTLD                    4.045673e+07
               HDQ1ZB                    1.532134e+07

Essentially, I want the highest value of each group dropped since the dataframe is already sorted in descending order by sum.

I found this solution and tried this:

def process_data(data):
    data = data[data['Bucket Number'] == 25.0].groupby(['Activity Month', 'Agent Sign']).agg({'Total Ping Current Forecast': [np.sum]})
    data = data.sort_values(['Activity Month', ('Total Ping Current Forecast', 'sum')], ascending=[True, False]).apply(lambda x: x.iloc[1:]).groupby(level=0).head(3)
    return data

But it only applied the function to the first group giving this result:

                          Total Ping Current Forecast
                                                  sum
Activity Month Agent Sign                            
202001         HDQGR1                    2.171435e+08
               CRCTLD                    4.774614e+07
               DFW1DF                    1.622023e+07
202002         {Various}                 1.159181e+09
               HDQGR1                    1.912536e+08
               CRCTLD                    4.573402e+07
202003         {Various}                 1.090292e+09
               HDQGR1                    1.852591e+08
               CRCTLD                    4.045673e+07

How do I apply that function to each group in the dataframe?

quazi_moto
  • 449
  • 1
  • 3
  • 14
  • Does this answer your question? [Python: Pandas - Delete the first row by group](https://stackoverflow.com/questions/31226142/python-pandas-delete-the-first-row-by-group) – RichieV Sep 03 '20 at 04:09

1 Answers1

0

In your code instead of finishing with .head(3) you could use .nth([1, 2]).

This would return the second and third elements from every group, as the index for nth is zero-based.

Read more in the docs

RichieV
  • 5,103
  • 2
  • 11
  • 24