2

I want to be able to pass the names of lists containing column names in a dataframe and apply after groupby different aggregating functions to each set.

So a naive and unsuccessful attempt was the following:

import pandas as pd
import seaborn as sns

mpg= sns.load_dataset('mpg')

variables_to_mean = ['cylinders', 'displacement']
variables_to_median = ['weight', 'horsepower']

mpg.groupby(['model_year', 'origin']).agg({ variables_to_mean : 'mean', variables_to_median : 'median'})

TypeError: unhashable type: 'list'

How can I achieve my objective?

halfer
  • 19,824
  • 17
  • 99
  • 186
user8270077
  • 4,621
  • 17
  • 75
  • 140

1 Answers1

3

Create dictionary by dict.fromkeys and merge together:

variables_to_mean = ['cylinders', 'displacement']
variables_to_median = ['weight', 'horsepower']

d = {**dict.fromkeys(variables_to_mean, 'mean'),**dict.fromkeys(variables_to_median, 'median')}
print (d)
{'cylinders': 'mean', 'displacement': 'mean', 'weight': 'median', 'horsepower': 'median'}

df = mpg.groupby(['model_year', 'origin']).agg(d)

print (df.head())
                   cylinders  displacement  weight  horsepower
model_year origin                                             
70         europe   4.000000    107.800000  2375.0        90.0
           japan    4.000000    105.000000  2251.0        91.5
           usa      7.636364    336.909091  3651.0       167.5
71         europe   4.000000     95.000000  2069.5        73.0
           japan    4.000000     88.250000  1951.5        78.5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252