Flatten hierarchically indexed pandas.DataFrame from groupby and multiple aggregation

Question

I'm grouping a dataframe by multiple columns and aggregating to obtain multiple statistics. How to obtain a totally flat structure with each possible combination of group-keys enumerated as rows and each statistic present as columns?

import numpy as np
import pandas as pd

cities = ['Berlin', 'Oslo']
days = ['Monday', 'Friday']

data = pd.DataFrame({
        'city': np.random.choice(cities, 12),
        'day': np.random.choice(days, 12),
        'people': np.random.normal(loc=10, size=12),
        'cats': np.random.normal(loc=6, size=12)})
grouped = data.groupby(['city', 'day']).agg([np.mean, np.std])

This way I'm getting:

                   cats               people          
                   mean       std       mean       std
city   day                                            
Berlin Friday  6.146924  0.721263  10.445606  0.730992
       Monday  5.239267       NaN   9.022811       NaN
Oslo   Friday  6.322276  0.866899  11.579813  0.114341
       Monday  5.028919  0.815674  10.458439  1.182689

I need to get it flat:

city   day     cats_mean cats_std  people_mean people_std                                       
Berlin Friday  6.146924  0.721263  10.445606   0.730992
Berlin Monday  5.239267       NaN   9.022811        NaN
Oslo   Friday  6.322276  0.866899  11.579813   0.114341
Oslo   Monday  5.028919  0.815674  10.458439   1.182689

You can just call `grouped.reset_index()` to restore the indices back to columns — EdChum, May 08 '17 at 13:41
@EdChum it still leaves a MultiIndex there, making it hard to manipulate aggregated statistics as columns — adam.ra, May 08 '17 at 13:43
Assign back `grouped = grouped.reset_index()`, OK you want to flatten the columns also — EdChum, May 08 '17 at 13:44
Won't help, still can't refer to columns, e.g., grouped['mean'] raises a KeyError — adam.ra, May 08 '17 at 13:46

score 13 · Accepted Answer · answered May 08 '17 at 13:54

In [36]: grouped.columns = grouped.columns.map('_'.join)

In [37]: grouped = grouped.reset_index()

In [38]: grouped
Out[38]:
     city     day  cats_mean  cats_std  people_mean  people_std
0  Berlin  Friday   5.852991  1.085163    11.078541    0.839688
1  Berlin  Monday   6.978343  0.630983     9.876106    1.846204
2    Oslo  Friday   6.096773  1.278176     9.710216    0.691672

Scott Boston · Answer 2 · 2021-03-18T01:28:27.670

Update using pd.NamedAgg

import numpy as np
import pandas as pd

cities = ['Berlin', 'Oslo']
days = ['Monday', 'Friday']

data = pd.DataFrame({
        'city': np.random.choice(cities, 12),
        'day': np.random.choice(days, 12),
        'people': np.random.normal(loc=10, size=12),
        'cats': np.random.normal(loc=6, size=12)})
grouped = data.groupby(['city', 'day']).agg(cats_mean=('cats', np.mean),
                                            cats_std=('cats', np.std),
                                            people_mean=('people', np.mean),
                                            people_std=('people', np.std))

grouped.reset_index()

Nest renaming is deprecated

You can use a dictionary in .agg to rename your columns then drop column level and reset_index(): See this SO Post

import numpy as np import pandas as pd cities = ['Berlin', 'Oslo'] days = ['Monday', 'Friday'] data = pd.DataFrame({ 'city': np.random.choice(cities, 12), 'day': np.random.choice(days, 12), 'people': np.random.normal(loc=10, size=12), 'cats': np.random.normal(loc=6, size=12)}) grouped = data.groupby(['city', 'day']).agg({'cats':{'cats_mean':np.mean,'cats_std':np.std},'people':{'people_mean':np.mean,'people_std':np.std}}) grouped.columns = grouped.columns.droplevel() grouped.reset_index() ---------------

Output:

     city     day  people_mean  people_std  cats_std  cats_mean
0  Berlin  Friday     9.645190    0.699684  0.973866   6.478510
1  Berlin  Monday     9.556898    0.126810  0.336654   6.624288
2    Oslo  Friday    11.593491         NaN       NaN   6.206595
3    Oslo  Monday    10.202183    1.058651  0.657939   6.019748

Please be aware of `FutureWarning: using a dict with renaming is deprecated and will be removed in a future version`. Here is a [link](http://pandas.pydata.org/pandas-docs/version/0.20/whatsnew.html#deprecate-groupby-agg-with-a-dictionary-when-renaming) — MaxU - stand with Ukraine, May 08 '17 at 14:02

score 2 · Answer 3 · answered May 08 '17 at 13:50

You can perform a list comprehension on the column levels and join with an underscore and then call reset_index:

In [39]:    
grouped.columns= ['_'.join(x) for x in list(zip(grouped.columns.get_level_values(0), grouped.columns.get_level_values(1)))]
grouped = grouped.reset_index()
grouped

Out[39]:
     city     day  cats_mean  cats_std  people_mean  people_std
0  Berlin  Friday   6.140710  0.555981    10.187634    0.359724
1  Berlin  Monday   6.420175  0.986568    10.134376    0.963938
2    Oslo  Friday   6.978572  0.573297    11.345484    1.454762
3    Oslo  Monday   4.594814       NaN    10.842988         NaN

Flatten hierarchically indexed pandas.DataFrame from groupby and multiple aggregation

3 Answers3