How do I sum and average rows for a panel dataset in Pandas?

Question

I am confused as to how I can sum rows together and also average rows too. For my data, it is a panel from 2007 to 2013 for a set of districts in India.

The pictures below describe my data.

For my data, I want to add all the columns except Rural Literacy Rate, but I wish to add them according to each year. So I would like the 2007 values to be added to 2007 values, 2008 to 2008, and so on. I wish to add the Adilabad district values to the Anantapur district values. Lastly, for Rural Literacy Rate, I wish to average the values, but in the same way by year by year. I tried grouping but I became lost because I was not able to group by year.

do `df.head(5).to_dict()` and include that in your question, instead of images. Please see this link for more help on how to create a MRE (minimum reproducible example): https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — David Erickson, Dec 23 '20 at 01:44

score 0 · Answer 1 · answered Dec 23 '20 at 03:25

Here's some different ways to do this using groupby.

Method 1

Use the .agg method to apply different functions to the different columns.

d = {}
for col in df.columns.drop(['State', 'District', 'Rural Literacy Rate']):
    d[col] = 'sum'
d['Rural Literacy Rate'] = 'mean'
gb = df.groupby('Year')
gb.agg(d)

Method 2

Slice according to columns, use built-in aggregation methods and then concatenate

sum_cols = df.columns.drop(['State', 'District', 'Rural Literacy Rate']):
mean_cols = ['Rural Literacy Rate']
gb = df.groupby('Year')
pd.concat(gb[sum_cols].sum(), gb[mean_cols].mean(), axis=1)

Method 3

Slice according to columns, use .apply and then concatenate

import numpy as np

sum_cols = df.columns.drop(['State', 'District', 'Rural Literacy Rate']):
mean_cols = ['Rural Literacy Rate']
gb = df.groupby('Year')
pd.concat(gb[sum_cols].apply(np.sum), gb[mean_cols].apply(np.mean), axis=1)

All three methods lead to the same column names, although it could be helpful to rename them to indicate the aggregations that were performed.

Method 1 and 3 are nice because you can use other functions besides the standard built-in aggregations (sum, count, mean, etc).

Following this answer, you can wrap this all up in a custom function and use .apply, which has the added benefit of giving the columns added names at the same time.

Many many many thanks! This helps a lot and I appreciate the detail you gave! — Krunal Desai, Dec 25 '20 at 00:47
You're welcome! Please consider upvoting or selecting as preferred answer if it meets your needs. — rmwenz, Dec 26 '20 at 04:30

How do I sum and average rows for a panel dataset in Pandas?

1 Answers1