Date values are missing, need to create them and autofill with last known date

Question

I need to achieve the following:

Data:

A B C
9612 2018-05-23 2.1
9612 2018-05-25 3.7
9615 2018-06-03 5.0
9615 2018-06-05 4.7

What I need:

A B C
9612 2018-05-23 2.1
9612 2018-05-24 2.1
9612 2018-05-25 3.7
9615 2018-06-03 5.0
9615 2018-06-04 5.0
9615 2018-06-05 4.7

The idea is that some times specific date values are missing from the data for some specific column 'A' values, so I need to loop through every value in 'B' (for each value of 'A') to find missing date values and create a new one that replicates the one before it.

Any ideas?

Thank you very much in advance!

Best Regards,

Welcome mate. The way to make your question don't help us to reproduce your problem. For this read [How to make good reproducible pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), The article [How do I ask a good question](https://stackoverflow.com/help/how-to-ask) can help you improve your question — Terry, Mar 19 '19 at 20:57
you can use shift function to get previous value https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html — Dirk N, Mar 19 '19 at 21:06
also see https://stackoverflow.com/questions/27905295/how-to-replace-nans-by-preceding-values-in-pandas-dataframe?rq=1 — Dirk N, Mar 19 '19 at 21:12
You need to fill all days until you find a valid `C`? or between `min` and `max` days of each month? — Terry, Mar 19 '19 at 22:22
I need to create a new date if it's missing from max and min values of each value in column 'A', for example, 9612 is missing 2018-05-24 so I need to add that value and use the data in 2018-05-23 for it. After that one, 9615 is missing 2018-06-04 so I need to add that one aswell and use the data from 9615 2018-06-03. — Dlkastro92, Mar 19 '19 at 22:48

score 0 · Accepted Answer · answered Mar 19 '19 at 23:19

Find the min and max date of each value in A column. In for loop, create the new dates between using asfreq and append on DataFrame.

df['B'] = pd.to_datetime(df['B'])

df2 = df.copy()
ids = df2.A.unique()
for id in ids:
    temp = df2.loc[df2.A == id, 'B'].agg(['min', 'max'])

    df2 = df2.append(df2.loc[ (df2.B.between(temp['min'], temp['max'])) & 
                              (df2.A == id)
                            ].set_index('B').asfreq('d').reset_index())

After this, remove duplicates, sort by date and fill the NaN values with ffill method

df2 = df2.drop_duplicates().sort_values('B').ffill()

df.head()

A   B   C
0   9612.0  2018-05-23  2.1
1   9612.0  2018-05-24  2.1
1   9612.0  2018-05-25  3.7
2   9615.0  2018-06-03  5.0
1   9615.0  2018-06-04  5.0
3   9615.0  2018-06-05  4.7

That is just perfect! Thank you very much Terry! – Dlkastro92 Mar 20 '19 at 03:15 — Dlkastro92, Mar 20 '19 at 03:15

Date values are missing, need to create them and autofill with last known date

1 Answers1