I have the following dataframe with a tricky problem:
Disease State Month Value
Covid Texas 2020-03 2
Covid Texas 2020-04 3
Covid Texas 2020-05 4
Covid Texas 2020-08 3
Cancer Florida 2020-04 4
Covid Florida 2020-03 6
Covid Florida 2020-04 4
Flu Florida 2020-03 5
I have to make a list of values for 3 consecutive months and create a new dataframe. However, there are some conditions:
The list will be created for each disease, each month (from start to end: 2020 Feb - 2021 April) and each state.
If any specific month is absent in the dataset, row for that month would be created and the value for that month would be 0.
Desired output:
Disease State Month ValueList
Covid Texas 2020-02 [0, 2, 3] (no dataset for Feb 20 but next two months are)
Covid Texas 2020-03 [2, 3, 4] (has values for 3 consecutive months)
Covid Texas 2020-04 [3, 4, 0] (doesn’t have value for 6th month)
Covid Texas 2020-05 [4, 0, 0] (has value for present month)
Covid Texas 2020-06 [0, 0, 3] (has value for 8th month)
Covid Texas 2020-07 [0, 3, 0] (has value for 8th month)
Covid Texas 2020-08 [3, 0, 0] (has value for present month)
Covid Texas 2020-09 [0, 0, 0] (no dataset for next 3 months)
Covid Texas 2020-10 [0, 0, 0] (no dataset for next 3 months)
Covid Texas 2020-11 [0, 0, 0] (no dataset for next 3 months)
Covid Texas 2020-12 [0, 0, 0] (no dataset for next 3 months)
Covid Texas 2021-01 [0, 0, 0] (no dataset for next 3 months)
Covid Texas 2021-02 [0, 0, 0] (no dataset for next 3 months)
Covid Texas 2021-03 [0, 0, 0] (no dataset for next 3 months)
Covid Texas 2021-04 [0, 0, 0] (no dataset for next 3 months)
I am trying to fill in dates using this:
df3= (df2.set_index('MonthEnd')
.groupby(['Disease', 'State']).apply(lambda x: x.drop(['Disease', 'State'], axis=1).asfreq('D'))
.reset_index())
However, it doesn't returns the same time frame for each group. It returns the values between the min and max date in that group.
I’m not sure how I should start. Any help would be appreciated. Thanks!