2

Is there a way to compute arbitrary number of different groupby levels in one go with some pre-built Pandas function? Below is a simple example with two columns.

import pandas as pd

df1 = pd.DataFrame( { 
    "name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"], 
    "city" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"],  
    "dollars":[1, 1, 1, 1, 1, 1] })

group1 = df1.groupby("city").dollars.sum().reset_index()
group1['name']='All'

group2 = df1.groupby("name").dollars.sum().reset_index()
group2['city']='All'

group3 = df1.groupby(["name", "city"]).dollars.sum().reset_index()

total = df1.dollars.sum()
total_df=pd.DataFrame({ 
    "name" : ["All"], 
    "city" : ["All"],  
    "dollars": [total] })

all_groups = group3.append([group1, group2, total_df], sort=False)


    name    city    dollars
0   Alice   Seattle     1
1   Bob     Seattle     2
2   Mallory Portland    2
3   Mallory Seattle     1
0   All     Portland    2
1   All     Seattle     4
0   Alice   All         1
1   Bob     All         2
2   Mallory All         3
0   All     All         6

So I took Ben. T example and rebuilt it from sum() to agg(). The next step for me is to build an option to pass a specific list of groupby combinations, in case not all of them are needed.

from itertools import combinations
import pandas as pd

df1 = pd.DataFrame( { 
    "name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"], 
    "city" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"],  
    "dollars":[1, 2, 6, 5, 3, 4],
    "qty":[2, 3, 4, 1, 5, 6] ,
    "id":[1, 1, 2, 2, 3, 3] 
})

col_gr = ['name', 'city']
agg_func={'dollars': ['sum', 'max', 'count'], 'qty': ['sum'], "id":['nunique']}

def multi_groupby(in_df, col_gr, agg_func, all_value="ALL"):
    tmp1 = pd.DataFrame({**{col: all_value for col in col_gr}}, index=[0])
    tmp2 = in_df.agg(agg_func)\
                .unstack()\
                .to_frame()\
                .transpose()\
                .dropna(axis=1)
    tmp2.columns = ['_'.join(col).strip() for col in tmp2.columns.values]
    total = tmp1.join(tmp2)

    for r in range(len(col_gr), 0, -1):
        for cols in combinations(col_gr, r):
            tmp_grp = in_df.groupby(by=list(cols))\
                .agg(agg_func)\
                .reset_index()\
                .assign(**{col: all_value for col in col_gr if col not in cols})
            tmp_grp.columns = ['_'.join(col).rstrip('_') for col in tmp_grp.columns.values]
            total = pd.concat([total]+[tmp_grp], axis=0, ignore_index=True)
    return total

multi_groupby(df1, col_gr, agg_func)


  • `Is there a way to compute arbitrary number of different groupby levels in one go with some pre-built Pandas function?` No, not exist. – jezrael Feb 12 '20 at 13:24

2 Answers2

1

Assuming you look for a general way to create all the combinations in the groupby, you can use itertools.combinations:

from itertools import combinations

col_gr = ['name', 'city']
col_sum = ['dollars']

all_groups = pd.concat( [ df1.groupby(by=list(cols))[col_sum].sum().reset_index()\
                             .assign(**{col:'all' for col in col_gr if col not in cols})
                         for r in range(len(col_gr), 0, -1) for cols in combinations(col_gr, r) ] 
                      + [ pd.DataFrame({**{col:'all' for col in col_gr}, 
                                        **{col: df1[col].sum() for col in col_sum},}, index=[0])], 
                        axis=0, ignore_index=True)
print (all_groups)

      name      city  dollars
0    Alice   Seattle        1
1      Bob   Seattle        2
2  Mallory  Portland        2
3  Mallory   Seattle        1
4    Alice       all        1
5      Bob       all        2
6  Mallory       all        3
7      all  Portland        2
8      all   Seattle        4
9      all       all        6
Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • this is a very interesting approach! I'm struggling to understand some of the use of ** and assign - can you tell me what this is called in Python so I can google it? – keg5038 Feb 12 '20 at 17:35
  • @keg5038 ** is for unpacking dictionnary and `assign` is a method from pandas, that can be use with unpacked dictionnary. For the {**dict1, **dict2} is used to union several dictionaries, a bit like `update` – Ben.T Feb 12 '20 at 17:39
  • 1
    great, thanks so much! I have some googling to do now! – keg5038 Feb 12 '20 at 17:39
  • 1
    I understood that the way I formulated my question, might have looked that I'm looking only for "all combinations". I'm searching for a way to calculate results on a desired list of combinations ("all combinations" is one of the possible options). Anyway, your example helped me a lot, thank you! same as keg5038, I learned something new :) – Grinvydas Kareiva Feb 13 '20 at 14:53
  • @GrinvydasKareiva ok, I guess if you have the list of combinations, then it is some small changes in the method above. let me know if you are unsure, I can edit the answer to fit your comment :) – Ben.T Feb 13 '20 at 15:07
0

This has been something I've been looking for consistently as well. Here are links to two methods that other people have written up that have helped me with this problem. Would certainly be interested in other takes as well.

Link 1 Link 2

keg5038
  • 341
  • 3
  • 13