0

I have a long format df that has an amount column (absolute values) aggregated up to 3 different levels of date, country and group.

import pandas as pd
df  = pd.DataFrame.from_dict([{ "date": "2022-02", "country": "Serbia", "group": 3, "amount": 33948 }, { "date": "2021-05", "country": "Thailand", "group": 3, "amount": 15857 }, { "date": "2021-05", "country": "Russia", "group": 2, "amount": 42855 }, { "date": "2021-11", "country": "Ukraine", "group": 3, "amount": 57306 }, { "date": "2021-06", "country": "Poland", "group": 2, "amount": 52898 }, { "date": "2022-02", "country": "Indonesia", "group": 3, "amount": 32330 }, { "date": "2021-11", "country": "Indonesia", "group": 1, "amount": 33791 }, { "date": "2022-02", "country": "China", "group": 3, "amount": 45050 }, { "date": "2021-12", "country": "Indonesia", "group": 1, "amount": 13865 }, { "date": "2022-03", "country": "Sweden", "group": 1, "amount": 45039 }, { "date": "2021-05", "country": "Colombia", "group": 3, "amount": 9363 }, { "date": "2022-01", "country": "Bangladesh", "group": 1, "amount": 47121 }, { "date": "2022-02", "country": "Indonesia", "group": 2, "amount": 18855 }, { "date": "2021-05", "country": "China", "group": 1, "amount": 49383 }, { "date": "2021-06", "country": "Turkmenistan", "group": 3, "amount": 61386 }, { "date": "2021-09", "country": "Kenya", "group": 3, "amount": 40434 }, { "date": "2022-03", "country": "Nicaragua", "group": 3, "amount": 3801 }, { "date": "2022-02", "country": "China", "group": 1, "amount": 39416 }, { "date": "2022-03", "country": "Brazil", "group": 1, "amount": 13657 }, { "date": "2021-05", "country": "Colombia", "group": 2, "amount": 23473 }, { "date": "2022-02", "country": "China", "group": 3, "amount": 19742 }, { "date": "2021-08", "country": "Russia", "group": 2, "amount": 45098 }, { "date": "2022-01", "country": "China", "group": 3, "amount": 15158 }, { "date": "2021-08", "country": "China", "group": 3, "amount": 18376 }, { "date": "2022-01", "country": "Slovenia", "group": 2, "amount": 71213 }, { "date": "2022-02", "country": "Czech Republic", "group": 2, "amount": 32744 }, { "date": "2021-06", "country": "Netherlands", "group": 1, "amount": 42706 }, { "date": "2021-07", "country": "China", "group": 2, "amount": 40277 }, { "date": "2021-08", "country": "United States", "group": 2, "amount": 3070 }, { "date": "2021-07", "country": "Germany", "group": 3, "amount": 17039 }, { "date": "2021-12", "country": "China", "group": 2, "amount": 8714 }, { "date": "2022-01", "country": "Malta", "group": 2, "amount": 44230 }, { "date": "2022-01", "country": "Russia", "group": 3, "amount": 33626 }, { "date": "2021-09", "country": "Greece", "group": 2, "amount": 72860 }, { "date": "2021-08", "country": "China", "group": 1, "amount": 59254 }, { "date": "2022-01", "country": "Japan", "group": 3, "amount": 18136 }, { "date": "2021-08", "country": "Venezuela", "group": 2, "amount": 14065 }, { "date": "2022-01", "country": "China", "group": 2, "amount": 36930 }, { "date": "2022-01", "country": "Honduras", "group": 2, "amount": 768 }, { "date": "2021-08", "country": "Vietnam", "group": 2, "amount": 33652 }, { "date": "2021-07", "country": "Ukraine", "group": 2, "amount": 54050 }, { "date": "2021-09", "country": "Indonesia", "group": 2, "amount": 50304 }, { "date": "2021-10", "country": "Peru", "group": 1, "amount": 27157 }, { "date": "2021-08", "country": "Brazil", "group": 3, "amount": 15869 }, { "date": "2021-11", "country": "Sweden", "group": 1, "amount": 32451 }, { "date": "2021-12", "country": "Mozambique", "group": 2, "amount": 29659 }, { "date": "2022-01", "country": "Argentina", "group": 2, "amount": 25282 }, { "date": "2021-06", "country": "Mongolia", "group": 2, "amount": 63027 }, { "date": "2021-07", "country": "Sudan", "group": 2, "amount": 5006 }, { "date": "2021-08", "country": "United States", "group": 2, "amount": 73414 }, { "date": "2021-05", "country": "China", "group": 3, "amount": 34759 }, { "date": "2021-12", "country": "Brazil", "group": 1, "amount": 636 }, { "date": "2021-06", "country": "Philippines", "group": 2, "amount": 59227 }, { "date": "2021-10", "country": "Russia", "group": 1, "amount": 28537 }, { "date": "2021-08", "country": "China", "group": 3, "amount": 23460 }, { "date": "2022-02", "country": "Philippines", "group": 2, "amount": 62968 }, { "date": "2021-10", "country": "Ukraine", "group": 3, "amount": 63908 }, { "date": "2021-10", "country": "Ukraine", "group": 3, "amount": 38263 }, { "date": "2021-06", "country": "Botswana", "group": 1, "amount": 15918 }, { "date": "2022-02", "country": "Russia", "group": 1, "amount": 31156 }, { "date": "2021-07", "country": "France", "group": 3, "amount": 64077 }, { "date": "2021-07", "country": "China", "group": 1, "amount": 18932 }, { "date": "2022-02", "country": "Russia", "group": 1, "amount": 45279 }, { "date": "2021-07", "country": "Russia", "group": 1, "amount": 7849 }, { "date": "2021-09", "country": "China", "group": 1, "amount": 52640 }, { "date": "2021-07", "country": "Peru", "group": 2, "amount": 19369 }, { "date": "2021-07", "country": "Greece", "group": 1, "amount": 20489 }, { "date": "2021-11", "country": "China", "group": 3, "amount": 30177 }, { "date": "2021-07", "country": "Portugal", "group": 1, "amount": 69521 }, { "date": "2021-06", "country": "Thailand", "group": 3, "amount": 17341 }, { "date": "2021-12", "country": "Peru", "group": 3, "amount": 27012 }, { "date": "2021-12", "country": "Afghanistan", "group": 1, "amount": 34146 }, { "date": "2021-11", "country": "Indonesia", "group": 1, "amount": 57619 }, { "date": "2021-05", "country": "Portugal", "group": 1, "amount": 37319 }, { "date": "2022-01", "country": "Denmark", "group": 1, "amount": 18370 }, { "date": "2022-01", "country": "United States", "group": 3, "amount": 4690 }, { "date": "2021-12", "country": "China", "group": 1, "amount": 35333 }, { "date": "2021-10", "country": "Indonesia", "group": 3, "amount": 74285 }, { "date": "2021-09", "country": "Mexico", "group": 1, "amount": 11260 }, { "date": "2021-11", "country": "Ukraine", "group": 3, "amount": 44389 }, { "date": "2021-11", "country": "France", "group": 3, "amount": 29432 }, { "date": "2021-08", "country": "Ecuador", "group": 1, "amount": 24529 }, { "date": "2021-08", "country": "Democratic Republic of the Congo", "group": 1, "amount": 5211 }, { "date": "2021-12", "country": "Georgia", "group": 3, "amount": 54164 }, { "date": "2021-05", "country": "France", "group": 2, "amount": 9046 }, { "date": "2021-05", "country": "Sweden", "group": 1, "amount": 10326 }, { "date": "2022-02", "country": "Madagascar", "group": 1, "amount": 70109 }, { "date": "2022-01", "country": "China", "group": 1, "amount": 25702 }, { "date": "2021-09", "country": "Poland", "group": 2, "amount": 46625 }, { "date": "2022-01", "country": "Czech Republic", "group": 1, "amount": 23806 }, { "date": "2021-06", "country": "Poland", "group": 2, "amount": 63310 }, { "date": "2021-11", "country": "Poland", "group": 3, "amount": 56290 }, { "date": "2021-12", "country": "Russia", "group": 3, "amount": 45846 }, { "date": "2021-09", "country": "Sweden", "group": 3, "amount": 26358 }, { "date": "2021-09", "country": "Colombia", "group": 2, "amount": 14682 }, { "date": "2021-11", "country": "China", "group": 1, "amount": 65021 }, { "date": "2022-02", "country": "Peru", "group": 1, "amount": 29406 }, { "date": "2022-01", "country": "China", "group": 1, "amount": 57333 }, { "date": "2021-05", "country": "Philippines", "group": 2, "amount": 28340 }, { "date": "2021-10", "country": "Japan", "group": 2, "amount": 37300 }])

example df

    date    country  group  amount
0   2022-02 Serbia   3      33948
1   2021-05 Thailand 3      15857
2   2021-05 Russia   2      42855
3   2021-11 Ukraine  3      57306
4   2021-06 Poland   2      52898
...

Date can be any yyyy-mm, Country can be any country, and group can be either 1, 2, 3.

What I would like to do is group by date and country, and then for each group, work out the relative percentage of the grouped date and country.

e.g. take the original df above to get something like (example of one date and one country):

date    country  group  amount
2022-02 Serbia   1      33948
                 2      34567
                 3      96787

and then convert the amount to a percentage:

date    country  group  amount_percentage
2022-02 Serbia   1      20.5
                 2      20.9
                 3      58.6

before collapsing back to the original format for all date and countries:

    date    country  group  amount_percentage
0   2022-02 Serbia   1      20.5
1   2022-02 Serbia   2      20.9
2   2022-02 Serbia   3      58.6
...

My current approach to tackling this is:

df.groupby(['date', 'country', 'group'])['amount'].sum().unstack()

which gives me grouped date and country columns, and the groups becoming column with relative amounts.

date    country  1     2     3
2022-02 Serbia   33948 34567 96787  
        USA      23457 67589 23456  
...

However I am not sure how to convert these to percentages of the row total, and then convert the dataframe back to final format. How would you address this last part?

1 Answers1

2

If I understand correctly, you can do a groupby and then transform('sum') on amount, and divide amount by that:

df['amount_percentage'] = df['amount'] / df.groupby(['date', 'country'], sort=False)['amount'].transform('sum') * 100

Output:

>>> df
       date      country  group  amount  amount_percentage
0   2022-02       Serbia      3   33948         100.000000
1   2021-05     Thailand      3   15857         100.000000
2   2021-05       Russia      2   42855         100.000000
3   2021-11      Ukraine      3   57306          56.350853
4   2021-06       Poland      2   52898          45.520102
..      ...          ...    ...     ...                ...
95  2021-11        China      1   65021          68.300805
96  2022-02         Peru      1   29406         100.000000
97  2022-01        China      1   57333          42.430230
98  2021-05  Philippines      2   28340         100.000000
99  2021-10        Japan      2   37300         100.000000
  • 1
    Glad to see you!!!!!!!!! I thought you wouldn't come back :) I can go to sleep now. lol – Corralien Mar 10 '22 at 22:03
  • 1
    Welcome back ;) – mozway Mar 10 '22 at 22:08
  • Thank @mozway! :) I'll avoid the mistake in the future though, so I don't get suspended :) –  Mar 10 '22 at 22:13
  • There's nothing wrong in telling a new user how to accept an answer, but I was doing it a bit too much, and some of the users weren't new and they knew how to accept answers :) –  Mar 10 '22 at 22:15
  • Haha...well, I'm sorry about that if it _is_ true! I know definitely lots of yours are better than mine anyway :P –  Mar 11 '22 at 15:33