I have mid-year estimated population data as illustrated below:
The dataframe can be created using the code below
import pandas as pd
df = pd.DataFrame({'Name':['WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
'WC - West Coast District Municipality (DC1)',
],
'Sex':['Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
'Male',
'Male',
'Female',
'Female',
],
'Age':['0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
'0-4',
'5-9',
],
'period':['2019/07/01',
'2019/07/01',
'2019/07/01',
'2019/07/01',
'2020/07/01',
'2020/07/01',
'2020/07/01',
'2020/07/01',
'2021/07/01',
'2021/07/01',
'2021/07/01',
'2021/07/01',
'2022/07/01',
'2022/07/01',
'2022/07/01',
'2022/07/01',
'2023/07/01',
'2023/07/01',
'2023/07/01',
'2023/07/01'],
'population':[21147.33972,
20435.77552,
20815.83029,
19908.72547,
21176.41455,
20678.62621,
20818.15366,
20166.97611,
21176.65456,
20819.50598,
20771.53888,
20316.90311,
21119.48584,
21024.48028,
20678.93492,
20525.76344,
21003.39475,
21219.41025,
20554.78559,
20706.95183,
]})
I want to convert it from yearly to monthly for each Name, Sex, Age
(i.e. groupby) in a linear manner (equal proportion): i.e. diff = (future mid-year estimate - current mid-year estimate)/12 then add the diff to current mid-year estimate.
I have done this in excel and the results is:
I have seen it done in R and seen examples using the .interpolate()
function but it does not consider this for data with multiple levels. What would be the best way to do it?