pandas: add a total row for each subgroup when groupby (especially for non-additive method such as `nunique`)

Question

It's easy to produces multi-level groupby result like this

                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
Parrot Captive       30.0
       Wild          20.0

The code would look like df.groupby(['animal', 'type'])['speed'].max()

However, if I want to add a total row to each subgroup, to produce something like this

                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
       overall      390.0
Parrot Captive       30.0
       Wild          20.0
       overall       30.0

How can I do it?

The reason for adding the sub-level-row, is that it enables choosing category when I put it into BI tools for other colleagues.

UPDATE: in the example above I show using max(), I also want to know how to do it with user_id.nunique().

Right now I produce the result by 2 groupby and then concat them. something like

df1 = df.groupby(['animal', 'type'])['speed'].max()
df2 = df.groupby(['animal'])['speed'].max()
##### ...  manually add `overall` index to df_2
df_total = pd.concat([df1, df2]).sort_index()

but it seems bit too manual, is there better approach?

score 3 · Answer 1 · answered Jul 22 '19 at 13:56

3

You can do this with 2 concat's, starting from your groupby result.

g = df.groupby(level=0).max()

m = pd.concat([g], keys=['overall'], names=['Type']).swaplevel(0, 1)

pd.concat([df, m], axis=0).sort_index(level=0)

                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
       overall      390.0
Parrot Captive       30.0
       Wild          20.0
       overall       30.0

answered Jul 22 '19 at 13:56

user3483203

50,081
9
65
94

What if I use something like `.nunique()` as aggregate method, will this approach still valid? – ZK Zhao Jul 24 '19 at 09:05

Brendan · Answer 2 · 2019-07-22T14:00:37.580

Adapting from this answer:

# Create data
np.random.seed(2019)
df = pd.DataFrame({ 
    'animal' : np.repeat(['Falcon', 'Parrot'], 10),
    'type' : np.tile(['Captive','Wild'], 10),
    'speed' : np.random.uniform(10,20,20)})
df.loc[df['animal'] == 'Falcon', 'speed'] = df['speed'] * 3
df.loc[df['type'] == 'Captive', 'speed'] = df['speed'] * .7

# Summary table
table = df.pivot_table(index=['animal','type'], values='speed', aggfunc=max)
# or... table = df.groupby(['animal','type'])['speed'].max().to_frame()

pd.concat([d.append(d.max().rename((k, 'Total')))
           for k, d in table.groupby(level=0) 
          ]).append(table.max().rename(('Grand','Total')))

gives

                    speed
animal type              
Falcon Captive  39.973127
       Wild     57.096185
       Total    57.096185
Parrot Captive  10.167126
       Wild     19.847235
       Total    19.847235
Grand  Total    57.096185

If you don't want the grand total, you can remove .append(table.max().rename(('Grand','Total')))

score 1 · Answer 3 · answered Jul 22 '19 at 17:33

1

An efficient way is :

df1 = df.groupby(['animal', 'type'])['speed'].max()

pd.concat([df1.reset_index(level='type'), pd.DataFrame(df1.max(level=0)).assign(
    type='overall')]).set_index('type',append=True).sort_index(level=0)

Out:

               speed
animal type              
Falcon Captive  19.238636
       Wild     19.607617
       overall  19.607617
Parrot Captive  18.386352
       Wild     17.735187
       overall  18.386352

answered Jul 22 '19 at 17:33

DeepBlue

415
4
9

What if I use something like .nunique() as aggregate method, is this approach still valid? – ZK Zhao Jul 24 '19 at 09:06
it should work as well. In general, any aggregation works. – DeepBlue Jul 24 '19 at 10:45

pandas: add a total row for each subgroup when groupby (especially for non-additive method such as `nunique`)

3 Answers3