1

It's easy to produces multi-level groupby result like this

                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
Parrot Captive       30.0
       Wild          20.0

The code would look like df.groupby(['animal', 'type'])['speed'].max()

However, if I want to add a total row to each subgroup, to produce something like this

                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
       overall      390.0
Parrot Captive       30.0
       Wild          20.0
       overall       30.0

How can I do it?

The reason for adding the sub-level-row, is that it enables choosing category when I put it into BI tools for other colleagues.

UPDATE: in the example above I show using max(), I also want to know how to do it with user_id.nunique().


Right now I produce the result by 2 groupby and then concat them. something like

df1 = df.groupby(['animal', 'type'])['speed'].max()
df2 = df.groupby(['animal'])['speed'].max()
##### ...  manually add `overall` index to df_2
df_total = pd.concat([df1, df2]).sort_index()

but it seems bit too manual, is there better approach?

ZK Zhao
  • 19,885
  • 47
  • 132
  • 206

3 Answers3

3

You can do this with 2 concat's, starting from your groupby result.


g = df.groupby(level=0).max()

m = pd.concat([g], keys=['overall'], names=['Type']).swaplevel(0, 1)

pd.concat([df, m], axis=0).sort_index(level=0)

                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
       overall      390.0
Parrot Captive       30.0
       Wild          20.0
       overall       30.0
user3483203
  • 50,081
  • 9
  • 65
  • 94
1

Adapting from this answer:

# Create data
np.random.seed(2019)
df = pd.DataFrame({ 
    'animal' : np.repeat(['Falcon', 'Parrot'], 10),
    'type' : np.tile(['Captive','Wild'], 10),
    'speed' : np.random.uniform(10,20,20)})
df.loc[df['animal'] == 'Falcon', 'speed'] = df['speed'] * 3
df.loc[df['type'] == 'Captive', 'speed'] = df['speed'] * .7

# Summary table
table = df.pivot_table(index=['animal','type'], values='speed', aggfunc=max)
# or... table = df.groupby(['animal','type'])['speed'].max().to_frame()

pd.concat([d.append(d.max().rename((k, 'Total')))
           for k, d in table.groupby(level=0) 
          ]).append(table.max().rename(('Grand','Total')))

gives

                    speed
animal type              
Falcon Captive  39.973127
       Wild     57.096185
       Total    57.096185
Parrot Captive  10.167126
       Wild     19.847235
       Total    19.847235
Grand  Total    57.096185

If you don't want the grand total, you can remove .append(table.max().rename(('Grand','Total')))

Brendan
  • 3,901
  • 15
  • 23
1

An efficient way is :

df1 = df.groupby(['animal', 'type'])['speed'].max()

pd.concat([df1.reset_index(level='type'), pd.DataFrame(df1.max(level=0)).assign(
    type='overall')]).set_index('type',append=True).sort_index(level=0)

Out:

               speed
animal type              
Falcon Captive  19.238636
       Wild     19.607617
       overall  19.607617
Parrot Captive  18.386352
       Wild     17.735187
       overall  18.386352
DeepBlue
  • 415
  • 4
  • 9