I have the following df:
Region Country Category Number
0 NA CA A 2
1 NA CA B 1
2 NA US A 2
3 NA US B 3
4 NA MX A 2
5 NA MX B 1
0 EU FR A 2
1 EU FR B 1
2 EU NL A 1
3 EU NL B 2
4 EU DK A 1
5 EU DK B 2
I am trying to create a new column that contains the relative contributions of a Country-Category to the overall Region.
For example, the NA-CA-A combination would have a contribution of 2/(2+2+2) or (NA-CA-A)/((NA-CA-A)+(NA-US-A)+(NA-MX-A)) = .333.. or 33.33...%.
Output would look like:
Region Country Category Population Number
0 NA CA A 2 .33
1 NA CA B 1 .2
2 NA US A 2 .33
3 NA US B 3 .6
4 NA MX A 2 .33
5 NA MX B 1 .2
0 EU FR A 2 .5
1 EU FR B 1 .2
2 EU NL A 1 .25
3 EU NL B 2 .4
4 EU DK A 1 .25
5 EU DK B 2 .4
I have tried the following code with no luck:
df['Contribution'] = df.groupby(['Region','Country,'Category'])['Number'].apply(lambda x: x*100/x.sum())
Thank you for your help!