0

I would like to find an easy way to compute the % of each sub-category in the category after i do the sum for each group. Here is an example:

df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'data1': range(6),
                   'data2': ['A1', 'B1', 'C1', 'A2', 'B2', 'C2']},
                   columns = ['key', 'data1', 'data2'])

df.groupby(['key','data2'])['data1'].sum()  

What I would like to do is create an additional column which show the % of each sub-category (i.e., A1 etc) in the respective category (i.e, A etc). For example I would like to know the percentages of A1/sum(A1+A2) till C2/sum(C1+C2).

Whats the easiest way to do that please?

Arpit Solanki
  • 9,567
  • 3
  • 41
  • 57
SBad
  • 1,245
  • 5
  • 23
  • 36
  • 1
    Possible duplicate of [Pandas percentage of total with groupby](https://stackoverflow.com/questions/23377108/pandas-percentage-of-total-with-groupby) – Andrew L Mar 20 '18 at 09:29

1 Answers1

0

Please clarify further on what you mean by the percentage of A1/(A1+A2), is the column "data2" an integer type?

Ok, I presume this should work for you:

sums = df.groupby(['key', 'data2']).agg({'data1': 'sum'})

percentages = sums.groupby(level=0).apply(lambda x: 100 * x / float(x.sum()))
Shibani
  • 148
  • 1
  • 3
  • 14
  • yes it is an integer. Just to clarify, If sum of column A (i.e., A1+A2) is 100. Out of 100, A1 represents 40 and A2 represents 60. I would like to output an additional column which shows that A1 represents 40% and A2 60%. – SBad Mar 20 '18 at 08:57
  • This is a similar question - https://stackoverflow.com/questions/23377108/pandas-percentage-of-total-with-groupby – Shibani Mar 20 '18 at 09:32
  • Thank you Shibani. It works fine with the percentages I was looking for. I definitely need to learn more about the function lambda. – SBad Mar 20 '18 at 09:37
  • This would be a good place to start reading about lambda: https://www.python-course.eu/lambda.php – Shibani Mar 20 '18 at 09:41