I have a long format df that has an amount column (absolute values) aggregated up to 3 different levels of date, country and group.
import pandas as pd
df = pd.DataFrame.from_dict([{ "date": "2022-02", "country": "Serbia", "group": 3, "amount": 33948 }, { "date": "2021-05", "country": "Thailand", "group": 3, "amount": 15857 }, { "date": "2021-05", "country": "Russia", "group": 2, "amount": 42855 }, { "date": "2021-11", "country": "Ukraine", "group": 3, "amount": 57306 }, { "date": "2021-06", "country": "Poland", "group": 2, "amount": 52898 }, { "date": "2022-02", "country": "Indonesia", "group": 3, "amount": 32330 }, { "date": "2021-11", "country": "Indonesia", "group": 1, "amount": 33791 }, { "date": "2022-02", "country": "China", "group": 3, "amount": 45050 }, { "date": "2021-12", "country": "Indonesia", "group": 1, "amount": 13865 }, { "date": "2022-03", "country": "Sweden", "group": 1, "amount": 45039 }, { "date": "2021-05", "country": "Colombia", "group": 3, "amount": 9363 }, { "date": "2022-01", "country": "Bangladesh", "group": 1, "amount": 47121 }, { "date": "2022-02", "country": "Indonesia", "group": 2, "amount": 18855 }, { "date": "2021-05", "country": "China", "group": 1, "amount": 49383 }, { "date": "2021-06", "country": "Turkmenistan", "group": 3, "amount": 61386 }, { "date": "2021-09", "country": "Kenya", "group": 3, "amount": 40434 }, { "date": "2022-03", "country": "Nicaragua", "group": 3, "amount": 3801 }, { "date": "2022-02", "country": "China", "group": 1, "amount": 39416 }, { "date": "2022-03", "country": "Brazil", "group": 1, "amount": 13657 }, { "date": "2021-05", "country": "Colombia", "group": 2, "amount": 23473 }, { "date": "2022-02", "country": "China", "group": 3, "amount": 19742 }, { "date": "2021-08", "country": "Russia", "group": 2, "amount": 45098 }, { "date": "2022-01", "country": "China", "group": 3, "amount": 15158 }, { "date": "2021-08", "country": "China", "group": 3, "amount": 18376 }, { "date": "2022-01", "country": "Slovenia", "group": 2, "amount": 71213 }, { "date": "2022-02", "country": "Czech Republic", "group": 2, "amount": 32744 }, { "date": "2021-06", "country": "Netherlands", "group": 1, "amount": 42706 }, { "date": "2021-07", "country": "China", "group": 2, "amount": 40277 }, { "date": "2021-08", "country": "United States", "group": 2, "amount": 3070 }, { "date": "2021-07", "country": "Germany", "group": 3, "amount": 17039 }, { "date": "2021-12", "country": "China", "group": 2, "amount": 8714 }, { "date": "2022-01", "country": "Malta", "group": 2, "amount": 44230 }, { "date": "2022-01", "country": "Russia", "group": 3, "amount": 33626 }, { "date": "2021-09", "country": "Greece", "group": 2, "amount": 72860 }, { "date": "2021-08", "country": "China", "group": 1, "amount": 59254 }, { "date": "2022-01", "country": "Japan", "group": 3, "amount": 18136 }, { "date": "2021-08", "country": "Venezuela", "group": 2, "amount": 14065 }, { "date": "2022-01", "country": "China", "group": 2, "amount": 36930 }, { "date": "2022-01", "country": "Honduras", "group": 2, "amount": 768 }, { "date": "2021-08", "country": "Vietnam", "group": 2, "amount": 33652 }, { "date": "2021-07", "country": "Ukraine", "group": 2, "amount": 54050 }, { "date": "2021-09", "country": "Indonesia", "group": 2, "amount": 50304 }, { "date": "2021-10", "country": "Peru", "group": 1, "amount": 27157 }, { "date": "2021-08", "country": "Brazil", "group": 3, "amount": 15869 }, { "date": "2021-11", "country": "Sweden", "group": 1, "amount": 32451 }, { "date": "2021-12", "country": "Mozambique", "group": 2, "amount": 29659 }, { "date": "2022-01", "country": "Argentina", "group": 2, "amount": 25282 }, { "date": "2021-06", "country": "Mongolia", "group": 2, "amount": 63027 }, { "date": "2021-07", "country": "Sudan", "group": 2, "amount": 5006 }, { "date": "2021-08", "country": "United States", "group": 2, "amount": 73414 }, { "date": "2021-05", "country": "China", "group": 3, "amount": 34759 }, { "date": "2021-12", "country": "Brazil", "group": 1, "amount": 636 }, { "date": "2021-06", "country": "Philippines", "group": 2, "amount": 59227 }, { "date": "2021-10", "country": "Russia", "group": 1, "amount": 28537 }, { "date": "2021-08", "country": "China", "group": 3, "amount": 23460 }, { "date": "2022-02", "country": "Philippines", "group": 2, "amount": 62968 }, { "date": "2021-10", "country": "Ukraine", "group": 3, "amount": 63908 }, { "date": "2021-10", "country": "Ukraine", "group": 3, "amount": 38263 }, { "date": "2021-06", "country": "Botswana", "group": 1, "amount": 15918 }, { "date": "2022-02", "country": "Russia", "group": 1, "amount": 31156 }, { "date": "2021-07", "country": "France", "group": 3, "amount": 64077 }, { "date": "2021-07", "country": "China", "group": 1, "amount": 18932 }, { "date": "2022-02", "country": "Russia", "group": 1, "amount": 45279 }, { "date": "2021-07", "country": "Russia", "group": 1, "amount": 7849 }, { "date": "2021-09", "country": "China", "group": 1, "amount": 52640 }, { "date": "2021-07", "country": "Peru", "group": 2, "amount": 19369 }, { "date": "2021-07", "country": "Greece", "group": 1, "amount": 20489 }, { "date": "2021-11", "country": "China", "group": 3, "amount": 30177 }, { "date": "2021-07", "country": "Portugal", "group": 1, "amount": 69521 }, { "date": "2021-06", "country": "Thailand", "group": 3, "amount": 17341 }, { "date": "2021-12", "country": "Peru", "group": 3, "amount": 27012 }, { "date": "2021-12", "country": "Afghanistan", "group": 1, "amount": 34146 }, { "date": "2021-11", "country": "Indonesia", "group": 1, "amount": 57619 }, { "date": "2021-05", "country": "Portugal", "group": 1, "amount": 37319 }, { "date": "2022-01", "country": "Denmark", "group": 1, "amount": 18370 }, { "date": "2022-01", "country": "United States", "group": 3, "amount": 4690 }, { "date": "2021-12", "country": "China", "group": 1, "amount": 35333 }, { "date": "2021-10", "country": "Indonesia", "group": 3, "amount": 74285 }, { "date": "2021-09", "country": "Mexico", "group": 1, "amount": 11260 }, { "date": "2021-11", "country": "Ukraine", "group": 3, "amount": 44389 }, { "date": "2021-11", "country": "France", "group": 3, "amount": 29432 }, { "date": "2021-08", "country": "Ecuador", "group": 1, "amount": 24529 }, { "date": "2021-08", "country": "Democratic Republic of the Congo", "group": 1, "amount": 5211 }, { "date": "2021-12", "country": "Georgia", "group": 3, "amount": 54164 }, { "date": "2021-05", "country": "France", "group": 2, "amount": 9046 }, { "date": "2021-05", "country": "Sweden", "group": 1, "amount": 10326 }, { "date": "2022-02", "country": "Madagascar", "group": 1, "amount": 70109 }, { "date": "2022-01", "country": "China", "group": 1, "amount": 25702 }, { "date": "2021-09", "country": "Poland", "group": 2, "amount": 46625 }, { "date": "2022-01", "country": "Czech Republic", "group": 1, "amount": 23806 }, { "date": "2021-06", "country": "Poland", "group": 2, "amount": 63310 }, { "date": "2021-11", "country": "Poland", "group": 3, "amount": 56290 }, { "date": "2021-12", "country": "Russia", "group": 3, "amount": 45846 }, { "date": "2021-09", "country": "Sweden", "group": 3, "amount": 26358 }, { "date": "2021-09", "country": "Colombia", "group": 2, "amount": 14682 }, { "date": "2021-11", "country": "China", "group": 1, "amount": 65021 }, { "date": "2022-02", "country": "Peru", "group": 1, "amount": 29406 }, { "date": "2022-01", "country": "China", "group": 1, "amount": 57333 }, { "date": "2021-05", "country": "Philippines", "group": 2, "amount": 28340 }, { "date": "2021-10", "country": "Japan", "group": 2, "amount": 37300 }])
example df
date country group amount
0 2022-02 Serbia 3 33948
1 2021-05 Thailand 3 15857
2 2021-05 Russia 2 42855
3 2021-11 Ukraine 3 57306
4 2021-06 Poland 2 52898
...
Date can be any yyyy-mm
, Country can be any country, and group can be either 1
, 2
, 3
.
What I would like to do is group by date and country, and then for each group, work out the relative percentage of the grouped date and country.
e.g. take the original df above to get something like (example of one date and one country):
date country group amount
2022-02 Serbia 1 33948
2 34567
3 96787
and then convert the amount to a percentage:
date country group amount_percentage
2022-02 Serbia 1 20.5
2 20.9
3 58.6
before collapsing back to the original format for all date and countries:
date country group amount_percentage
0 2022-02 Serbia 1 20.5
1 2022-02 Serbia 2 20.9
2 2022-02 Serbia 3 58.6
...
My current approach to tackling this is:
df.groupby(['date', 'country', 'group'])['amount'].sum().unstack()
which gives me grouped date and country columns, and the groups becoming column with relative amounts.
date country 1 2 3
2022-02 Serbia 33948 34567 96787
USA 23457 67589 23456
...
However I am not sure how to convert these to percentages of the row total, and then convert the dataframe back to final format. How would you address this last part?