0

I am confused as to how I can sum rows together and also average rows too. For my data, it is a panel from 2007 to 2013 for a set of districts in India.

The pictures below describe my data.

enter image description here

enter image description here

For my data, I want to add all the columns except Rural Literacy Rate, but I wish to add them according to each year. So I would like the 2007 values to be added to 2007 values, 2008 to 2008, and so on. I wish to add the Adilabad district values to the Anantapur district values. Lastly, for Rural Literacy Rate, I wish to average the values, but in the same way by year by year. I tried grouping but I became lost because I was not able to group by year.

Shankar Ganesh Jayaraman
  • 1,401
  • 1
  • 16
  • 22
Krunal Desai
  • 39
  • 1
  • 2
  • 2
    do `df.head(5).to_dict()` and include that in your question, instead of images. Please see this link for more help on how to create a MRE (minimum reproducible example): https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – David Erickson Dec 23 '20 at 01:44

1 Answers1

0

Here's some different ways to do this using groupby.

Method 1

Use the .agg method to apply different functions to the different columns.

d = {}
for col in df.columns.drop(['State', 'District', 'Rural Literacy Rate']):
    d[col] = 'sum'
d['Rural Literacy Rate'] = 'mean'
gb = df.groupby('Year')
gb.agg(d)

Method 2

Slice according to columns, use built-in aggregation methods and then concatenate

sum_cols = df.columns.drop(['State', 'District', 'Rural Literacy Rate']):
mean_cols = ['Rural Literacy Rate']
gb = df.groupby('Year')
pd.concat(gb[sum_cols].sum(), gb[mean_cols].mean(), axis=1)

Method 3

Slice according to columns, use .apply and then concatenate

import numpy as np

sum_cols = df.columns.drop(['State', 'District', 'Rural Literacy Rate']):
mean_cols = ['Rural Literacy Rate']
gb = df.groupby('Year')
pd.concat(gb[sum_cols].apply(np.sum), gb[mean_cols].apply(np.mean), axis=1)

All three methods lead to the same column names, although it could be helpful to rename them to indicate the aggregations that were performed.

Method 1 and 3 are nice because you can use other functions besides the standard built-in aggregations (sum, count, mean, etc).

Following this answer, you can wrap this all up in a custom function and use .apply, which has the added benefit of giving the columns added names at the same time.

rmwenz
  • 3,665
  • 1
  • 8
  • 8