I would like to understand if I can convert multi-index and multi-header DataFrame to a nested dict.
I have the following DataFrame:
revenue taxes
sum max min mean sum
company type
- - 15.00 15.00 15.00 0.00 0.00
Facebook - 1259.79 218.50 -120.00 5.17 321.00
City TOT 0.00 0.00 0.00 4.00 4.00
Country TOT 0.00 0.00 0.00 4.00 4.00
Sales 0.00 0.00 0.00 9.25 18.50
Google %tax_1 0.00 0.00 0.00 0.89 3.58
- 3738.36 2643.08 -100.00 96.23 26369.57
City TOT 0.00 0.00 0.00 3.55 95.99
Country TOT 0.00 0.00 0.00 23.25 628.00
...
df = pd.DataFrame.from_dict({('-', '-'): {('revenue', 'sum'): 15.0,
('revenue', 'max'): 15.0,
('revenue', 'min'): 15.0,
('taxes', 'mean'): 0.00,
('taxes', 'sum'): 0.0},
('Facebook', '-'): {('revenue', 'sum'): 1259.79,
('revenue', 'max'): 218.5,
('revenue', 'min'): -120.0,
('taxes', 'mean'): 5.17,
('taxes', 'sum'): 321.0},
('Facebook', 'City TOT'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 4.00,
('taxes', 'sum'): 4.0},
('Facebook', 'Country TOT'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 4.00,
('taxes', 'sum'): 4.0},
('Facebook', 'Sales'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 9.25,
('taxes', 'sum'): 18.5},
('Google', '%tax_1'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 0.89,
('taxes', 'sum'): 3.58},
('Google', '-'): {('revenue', 'sum'): 3738.36,
('revenue', 'max'): 2643.08,
('revenue', 'min'): -100.0,
('taxes', 'mean'): 96.23,
('taxes', 'sum'): 26369.57},
('Google', 'City TOT'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 3.55,
('taxes', 'sum'): 95.99},
('Google', 'Country TOT'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 23.25,
('taxes', 'sum'): 628.0},
('Google', 'PER GETS'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 0.88,
('taxes', 'sum'): 2.64},
('Google', 'Sales'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 12.61,
('taxes', 'sum'): 138.75},
('Google', 'VAT'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 11.70,
('taxes', 'sum'): 1065.51},
('Google', 'per room_1'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 5.00,
('taxes', 'sum'): 20.0},
('Google', 'tax on top'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 8.68,
('taxes', 'sum'): 78.2},
('Google', 'tax per reserv'): {('revenue', 'sum'): 0.0,
('revenue', 'max'): 0.0,
('revenue', 'min'): 0.0,
('taxes', 'mean'): 20.00,
('taxes', 'sum'): 40.0}}, orient='index')
df.index.names = ['company', 'type']
The columns and index are MultiIndex:
MultiIndex([('revenue', 'sum'),
('revenue', 'max'),
('revenue', 'min'),
( 'taxes', 'mean'),
( 'taxes', 'sum')],)
MultiIndex([( '-', '-'),
( 'Facebook', '-'),
( 'Facebook', 'City TOT'),
( 'Facebook', 'Country TOT'),
( 'Facebook', 'Sales'),
( 'Google', '%tax_1'),
...
I have tried to use this method (based on Pandas multi index dataframe to nested dictionary):
{level: df.xs(level).to_dict('index') for level in df.index.levels[0]}
However, this is what I get:
{'-': {('revenue', 'sum'): {'-': 15.0},
('revenue', 'max'): {'-': 15.0},
('revenue', 'min'): {'-': 15.0},
('taxes', 'mean'): {'-': 0.0)},
('taxes', 'sum'): {'-': 0.0}},
'Facebook': {('revenue', 'sum'): {'-': 1259.79,
'City TOT': 0.0,
'Country TOT': 0.0,
'Sales': 0.0},
('revenue', 'max'): {'-': 218.5,
'City TOT': 0.0,
'Country TOT': 0.0,
'Sales': 0.0},
('revenue', 'min'): {'-': -120.0,
'City TOT': 0.0,
'Country TOT': 0.0,
'Sales': 0.0},
...
What I expect to achieve instead is a nested dict on the columns (exactly as it is on the index), where the data gets nested per the indexes like this:
{
'-': {
'-': {
'revenue': {
'sum': 15.0,
'max': 15.0,
'min': 15.0,
},
'taxes': {
'mean': 0.00,
'sum': 0.00,
}
},
},
'Facebook': {
'-': {
'revenue': {
'sum': 1259.79,
'max': 218.5,
'min': -120.0
}
'taxes': ...
},
'City TOT': {
'revenue': {
'sum': 0.0,
'max': 0.0,
'min': 0.0
}
'taxes': ...
},
'Country TOT': {
'revenue': {
'sum': 0.0,
'max': 0.0,
'min': 0.0
}
'taxes': ...
},
'Sales': {
'revenue': {
'sum': 0.0,
'max': 0.0,
'min': 0.0
}
'taxes': ...
},
}
}
Any way I can get around with this? Thanks!