I am working with a large Pandas dataframe that has multiple index levels in both rows and columns, something like:
df
Metric | Population Homes
Year | 2018 2019 2020 2018 2019 2020
Town Sector |
---- ------ | ---- ---- ---- ---- ---- ----
A 1 | 100 110 120 50 52 52
2 | 200 205 210 80 80 80
3 | 300 300 300 100 100 100
B 1 | 50 60 70 20 22 24
2 | 100 100 100 40 40 40
3 | 150 140 130 50 47 44
I need to perform calculations for groups of columns, eg. find ratio between population and homes.
Step by step that would be:
# 1. Calculation
R = df["Population"] / df["Homes"]
R
Year | 2018 2019 2020
Town Sector |
---- ------ | ---- ---- ----
A 1 | 2.0 2.1 2.3
2 | 2.5 2.6 2.6
3 | 3.0 3.0 3.0
B 1 | 2.5 2.7 2.9
2 | 2.5 2.5 2.5
3 | 3.0 3.0 3.0
# 2. Re-build multiindex for columns (there are various levels, showing only one here)
R = pd.concat([R],keys=["Ratio"],axis=1)
R
| Ratio
Year | 2018 2019 2020
Town Sector |
---- ------ | ---- ---- ----
A 1 | 2.0 2.1 2.3
2 | 2.5 2.6 2.6
3 | 3.0 3.0 3.0
B 1 | 2.5 2.7 2.9
2 | 2.5 2.5 2.5
3 | 3.0 3.0 3.0
# 3. Concat previous calculation to the main dataframe
df = pd.concat([df,R],axis=1,sort=True) # I need the sort=True to avoid a performance Warning
df
Metric | Population Homes Ratio
Year | 2018 2019 2020 2018 2019 2020 2018 2019 2020
Town Sector |
---- ------ | ---- ---- ---- ---- ---- ---- ---- ---- ----
A 1 | 100 110 120 50 52 52 2.0 2.1 2.3
2 | 200 205 210 80 80 80 2.5 2.6 2.6
3 | 300 300 300 100 100 100 3.0 3.0 3.0
B 1 | 50 60 70 20 22 24 2.5 2.7 2.9
2 | 100 100 100 40 40 40 2.5 2.5 2.5
3 | 150 140 130 50 47 44 3.0 3.0 3.0
I can write the above expresions in a single line, but as I mentioned I have various index levels and it becomes complicated... Is there a way to do something simpler?
I would have guessed:
df["Ratio"] = df["Population"]/df["Homes"]
But it throws a "ValueError: Item must have length equal to number of levels."
Thanks!