Input code:
import pandas as pd
import numpy as np
#Dummy df:
df = pd.DataFrame({'Name': ['John', 'Boby', 'Mina', 'Peter',
'Nicky','Peter','Mina','Peter'],
'City': ['London','NY','LA','London','NY','HK','NY','HK'],
'Stage': ['Masters', 'Graduate', 'Graduate', 'Masters',
'Graduate','Masters','Graduate','Graduate'],
'Year':[2020,2019,2020,2019,2020,2019,2020,2020],
'Month':[202001,201902,202003,201904,202005,201902,202007,202012],
'Earnings': [27, 23, 21, 66, 24,22,34,65]})
df_pivot=pd.pivot_table(df,values = 'Earnings', index=
['Name','City','Stage'], columns = ['Year','Month'], aggfunc=np.sum,
fill_value=0, margins = True).sort_values('All', ascending=False)
print(df_pivot)
Output pivot table:
Year 2019 2020
All
Month 201902 201904 202001 202003 202005 202007 202012
Name City Stage
All 45 66 27 21 24 34 65 282
Peter London Masters 0 66 0 0 0 0 0 66
HK Graduate 0 0 0 0 0 0 65 65
Mina NY Graduate 0 0 0 0 0 34 0 34
John London Masters 0 0 27 0 0 0 0 27
Nicky NY Graduate 0 0 0 0 24 0 0 24
Boby NY Graduate 23 0 0 0 0 0 0 23
Peter HK Masters 22 0 0 0 0 0 0 22
Mina LA Graduate 0 0 0 21 0 0 0 21
Desired output sorted firstly by first column, then within the group by second column and lastly within the group by 3rd column:
Year 2019 2020 All
Month 201902 201904 202001 202003 202005 202007 202012
Name City Stage
All 45 66 27 21 24 34 65 282
Peter HK Graduate 0 0 0 0 0 0 65 65
Masters 22 0 0 0 0 0 0 22
London Masters 0 66 0 0 0 0 0 66
Mina NY Graduate 0 0 0 0 0 34 0 34
LA Graduate 0 0 0 21 0 0 0 21
John London Masters 0 0 27 0 0 0 0 27
Nicky NY Graduate 0 0 0 0 24 0 0 24
Boby NY Graduate 23 0 0 0 0 0 0 23
Please note how Peter-HK is higher than Peter-London, because sum of Peter-HK (65+22) > sum of Peter-London (66).
In other words: First give me Name with biggest total, then within that name give me City with Biggest total, then within that Name and that City give me Stage with biggest total.
Thank you pawel