Grouping using groupby and sum()-then compute percentages

Question

I would like to find an easy way to compute the % of each sub-category in the category after i do the sum for each group. Here is an example:

df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data1': range(6),
                   'data2': ['A1', 'B1', 'C1', 'A2', 'B2', 'C2']},
                   columns = ['key', 'data1', 'data2'])

df.groupby(['key','data2'])['data1'].sum()

What I would like to do is create an additional column which show the % of each sub-category (i.e., A1 etc) in the respective category (i.e, A etc). For example I would like to know the percentages of A1/sum(A1+A2) till C2/sum(C1+C2).

Whats the easiest way to do that please?

Possible duplicate of [Pandas percentage of total with groupby](https://stackoverflow.com/questions/23377108/pandas-percentage-of-total-with-groupby) — Andrew L, Mar 20 '18 at 09:29

Shibani · Accepted Answer · 2018-03-20T09:27:11.750

0

Please clarify further on what you mean by the percentage of A1/(A1+A2), is the column "data2" an integer type?

Ok, I presume this should work for you:

sums = df.groupby(['key', 'data2']).agg({'data1': 'sum'})

percentages = sums.groupby(level=0).apply(lambda x: 100 * x / float(x.sum()))

edited Mar 20 '18 at 09:27

answered Mar 20 '18 at 08:51

Shibani

148
1
3
14

yes it is an integer. Just to clarify, If sum of column A (i.e., A1+A2) is 100. Out of 100, A1 represents 40 and A2 represents 60. I would like to output an additional column which shows that A1 represents 40% and A2 60%. – SBad Mar 20 '18 at 08:57
This is a similar question - https://stackoverflow.com/questions/23377108/pandas-percentage-of-total-with-groupby – Shibani Mar 20 '18 at 09:32
Thank you Shibani. It works fine with the percentages I was looking for. I definitely need to learn more about the function lambda. – SBad Mar 20 '18 at 09:37
This would be a good place to start reading about lambda: https://www.python-course.eu/lambda.php – Shibani Mar 20 '18 at 09:41

Grouping using groupby and sum()-then compute percentages

1 Answers1