Here's some different ways to do this using groupby
.
Method 1
Use the .agg
method to apply different functions to the different columns.
d = {}
for col in df.columns.drop(['State', 'District', 'Rural Literacy Rate']):
d[col] = 'sum'
d['Rural Literacy Rate'] = 'mean'
gb = df.groupby('Year')
gb.agg(d)
Method 2
Slice according to columns, use built-in aggregation methods and then concatenate
sum_cols = df.columns.drop(['State', 'District', 'Rural Literacy Rate']):
mean_cols = ['Rural Literacy Rate']
gb = df.groupby('Year')
pd.concat(gb[sum_cols].sum(), gb[mean_cols].mean(), axis=1)
Method 3
Slice according to columns, use .apply
and then concatenate
import numpy as np
sum_cols = df.columns.drop(['State', 'District', 'Rural Literacy Rate']):
mean_cols = ['Rural Literacy Rate']
gb = df.groupby('Year')
pd.concat(gb[sum_cols].apply(np.sum), gb[mean_cols].apply(np.mean), axis=1)
All three methods lead to the same column names, although it could be helpful to rename them to indicate the aggregations that were performed.
Method 1 and 3 are nice because you can use other functions besides the standard built-in aggregations (sum, count, mean, etc).
Following this answer, you can wrap this all up in a custom function and use .apply
, which has the added benefit of giving the columns added names at the same time.