0

I have the following df:

    Region   Country  Category  Number
0   NA       CA       A         2
1   NA       CA       B         1
2   NA       US       A         2
3   NA       US       B         3
4   NA       MX       A         2
5   NA       MX       B         1
0   EU       FR       A         2
1   EU       FR       B         1
2   EU       NL       A         1
3   EU       NL       B         2
4   EU       DK       A         1
5   EU       DK       B         2

I am trying to create a new column that contains the relative contributions of a Country-Category to the overall Region.

For example, the NA-CA-A combination would have a contribution of 2/(2+2+2) or (NA-CA-A)/((NA-CA-A)+(NA-US-A)+(NA-MX-A)) = .333.. or 33.33...%.

Output would look like:

    Region   Country  Category  Population  Number
0   NA       CA       A         2           .33
1   NA       CA       B         1           .2
2   NA       US       A         2           .33
3   NA       US       B         3           .6
4   NA       MX       A         2           .33
5   NA       MX       B         1           .2
0   EU       FR       A         2           .5
1   EU       FR       B         1           .2
2   EU       NL       A         1           .25
3   EU       NL       B         2           .4
4   EU       DK       A         1           .25
5   EU       DK       B         2           .4

I have tried the following code with no luck:

df['Contribution'] = df.groupby(['Region','Country,'Category'])['Number'].apply(lambda x: x*100/x.sum())

Thank you for your help!

Jnams
  • 3
  • 2
  • You want `transform` instead of `apply` like [this answer](https://stackoverflow.com/a/57359372/15497888) – Henry Ecker Jul 14 '22 at 19:02
  • 1
    Also FWIW your results are grouped by Region and Category only. Not sure if that's a mess up in expected output or description: `df['Contribution'] = df.groupby(['Region', 'Category'])['Number'].transform(lambda x: x / x.sum())` (like [this answer](/a/55740831/15497888)) or `df['Contribution'] = df['Number'] / df.groupby(['Region', 'Category'])['Number'].transform('sum')` (like [this answer](/a/57359372/15497888)) produces the shown expected output. – Henry Ecker Jul 14 '22 at 19:08

0 Answers0