Expanding and filling the dataframe for missing dates by each group

Question

   df = pd.DataFrame({"ID":[1,1,1,2,2,2],
                   "Date":["29.12.2020","05.01.2021","15.02.2021","11.04.2021","27.05.2021","29.05.2021"],
                   "Amount":[6,5,7,9,8,7]}) 
   df["Date"] = pd.to_datetime(df["Date"])

Data:

ID	Date	Amount
1	29.12.2020	6
1	05.01.2021	5
1	15.02.2021	7
2	11.04.2021	9
2	27.05.2021	8
2	29.05.2021	7

Desired output:

ID	Date	Amount
1	01.12.2020	0
.	.	.
.	.	.
1	28.12.2020	0
1	29.12.2020	6
.	.	.
.	.	.
1	04.01.2020	6
1	05.01.2021	5
.	.	.
.	.	.
1	14.02.2021	5
1	15.02.2021	7
.	.	.
.	.	.
1	28.02.2021	7
2	01.04.2021	0
.	.	.
.	.	.
2	10.04.2021	0
2	11.04.2021	9
.	.	.
.	.	.
2	26.05.2021	9
2	27.05.2021	8
2	28.05.2021	8
2	29.05.2021	7

So, basically, I want to expand the data set for the missing dates and insert the latest of preceding amount value for every newly added date.

The date ranges of the ID groups do vary.

Plus, fill value should be 0 for the unobserved values in the month associated with the first observation (e.g. Amount column takes the value of 0 between 01.12.2020 and 28.12.2020, since the first observation of twelfth month of 2020 is on 29th day).

Tranbi · Accepted Answer · 2021-11-11T09:01:09.457

I would set the df index to Date, then group by ID and finally reindex depending on the oldest (replacing it with the first day of the month) and most recent dates:

import pandas as pd

df = pd.DataFrame({"ID":[1,1,1,2,2,2],
                   "Date":["29.12.2020","05.01.2021","15.02.2021","11.04.2021","27.05.2021","29.05.2021"],
                   "Amount":[6,5,7,9,8,7]}) 
df["Date"] = pd.to_datetime(df["Date"], format="%d.%m.%Y")
df = df.set_index("Date")

new_df = pd.DataFrame()
for id_val, obs_period in df.groupby("ID"):
    date_range = pd.date_range(min(obs_period.index).replace(day=1), max(obs_period.index))
    obs_period = obs_period.reindex(date_range, fill_value=pd.NA)
    obs_period["ID"] = id_val
    if pd.isna(obs_period.at[obs_period.index[0], "Amount"]):
        obs_period.at[obs_period.index[0], "Amount"] = 0  # adding 0 at the beginning of the period if undefined
    obs_period= obs_period.ffill()      # filling Amount with last value
    new_df = pd.concat([new_df, obs_period])

print(new_df)

BTW you should specify your date format while converting df["Date"]

Output:

            ID  Amount
2020-12-01   1     0.0
2020-12-02   1     0.0
2020-12-03   1     0.0
2020-12-04   1     0.0
2020-12-05   1     0.0
...         ..     ...
2021-05-25   2     9.0
2021-05-26   2     9.0
2021-05-27   2     8.0
2021-05-28   2     8.0
2021-05-29   2     7.0

[136 rows x 2 columns]

Noted. Thank you. BTW the issue is not expanding the dataframe for missing dates and filling them with zeros. They should be filled with the latest non-null if it exists or 0 if it does not. — Larx, Nov 11 '21 at 08:20
Understood. In that case you can use `ffill`. I updated my answer! — Tranbi, Nov 11 '21 at 09:01
It does not extent to unobserved future dates (such as 16-28 Feb of first ID group), however I decided that this is better for my actualy purpose and use your code as is. Thank you. — Larx, Nov 11 '21 at 09:36
I'm glad it helped! Finding the last day of the month is not as trivial as finding the first one but it's only a few lines. If you change your mind check this SO [answer](https://stackoverflow.com/a/13565185/13525512) — Tranbi, Nov 11 '21 at 09:43

Expanding and filling the dataframe for missing dates by each group

1 Answers1