I generate summary statistics for each group in the data while using multiple grouping criteria. The data:
import pandas as pd
d = {'Fruit': ['Apple', 'Apple','Apple','Orange','Orange'],
'City': ['Rome', 'Rome', 'London', 'London','London' ],
'Value': [1, 2, 3, 4, 5 ]}
df = pd.DataFrame(data=d)
print(df)
Fruit City Value
0 Apple Rome 1
1 Apple Rome 2
2 Apple London 3
3 Orange London 4
4 Orange London 5
The "groupby" command returns the sums only for the highest level.
keys=['Fruit','City']
df.groupby(keys).agg(Total_Value=('Value', 'sum'))
Total_Value
Fruit City
Apple London 3
Rome 3
Orange London 9
I would like to apply the sum function to the lower levels as well. Preferably the results for all levels should be displayed on the same table:
Total_Value
Fruit City
15
Apple 6
Orange 9
London 12
Rome 3
Apple London 3
Apple Rome 3
Orange London 9
Is there an easy way of generating such a table? It might be possible to use the Multiindex function for this purpose, yet I am not sure how it should be applied.
Thanks a lot
PS: In the example I have used the "sum" function. It is relatively easy to calculate the sums for lower levels. I am looking for a generic answer which is applicable to all functions, not only to sum. (e.g. mean function, or a lambda function etc.)
PS2: Something like the following would also work:
Total_Value Level
Fruit City
15 0
Rome 3 1
London 12 1
Apple 6 1
Orange 9 1
Apple Rome 3 2
London 3 2
Orange London 9 2
In this case the "Total_Value" of "London" is immediately clear. It is not necessary to check the order of the rows.