-1

I need to achieve the following:

Data:

A B C
9612 2018-05-23 2.1
9612 2018-05-25 3.7
9615 2018-06-03 5.0
9615 2018-06-05 4.7

What I need:

A B C
9612 2018-05-23 2.1
9612 2018-05-24 2.1
9612 2018-05-25 3.7
9615 2018-06-03 5.0
9615 2018-06-04 5.0
9615 2018-06-05 4.7

The idea is that some times specific date values are missing from the data for some specific column 'A' values, so I need to loop through every value in 'B' (for each value of 'A') to find missing date values and create a new one that replicates the one before it.

Any ideas?

Thank you very much in advance!

Best Regards,

Terry
  • 2,761
  • 2
  • 14
  • 28
Dlkastro92
  • 27
  • 5
  • 1
    Welcome mate. The way to make your question don't help us to reproduce your problem. For this read [How to make good reproducible pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), The article [How do I ask a good question](https://stackoverflow.com/help/how-to-ask) can help you improve your question – Terry Mar 19 '19 at 20:57
  • Hi Terry, thanks for the reply, I edited the post. – Dlkastro92 Mar 19 '19 at 21:05
  • column A is not a date – Dirk N Mar 19 '19 at 21:05
  • you can use shift function to get previous value https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html – Dirk N Mar 19 '19 at 21:06
  • also see https://stackoverflow.com/questions/27905295/how-to-replace-nans-by-preceding-values-in-pandas-dataframe?rq=1 – Dirk N Mar 19 '19 at 21:12
  • You need to fill all days until you find a valid `C`? or between `min` and `max` days of each month? – Terry Mar 19 '19 at 22:22
  • Edited for column 'B' – Dlkastro92 Mar 19 '19 at 22:45
  • I need to create a new date if it's missing from max and min values of each value in column 'A', for example, 9612 is missing 2018-05-24 so I need to add that value and use the data in 2018-05-23 for it. After that one, 9615 is missing 2018-06-04 so I need to add that one aswell and use the data from 9615 2018-06-03. – Dlkastro92 Mar 19 '19 at 22:48

1 Answers1

0

Find the min and max date of each value in A column. In for loop, create the new dates between using asfreq and append on DataFrame.

df['B'] = pd.to_datetime(df['B'])

df2 = df.copy()
ids = df2.A.unique()
for id in ids:
    temp = df2.loc[df2.A == id, 'B'].agg(['min', 'max'])

    df2 = df2.append(df2.loc[ (df2.B.between(temp['min'], temp['max'])) & 
                              (df2.A == id)
                            ].set_index('B').asfreq('d').reset_index())

After this, remove duplicates, sort by date and fill the NaN values with ffill method

df2 = df2.drop_duplicates().sort_values('B').ffill()

df.head()

A   B   C
0   9612.0  2018-05-23  2.1
1   9612.0  2018-05-24  2.1
1   9612.0  2018-05-25  3.7
2   9615.0  2018-06-03  5.0
1   9615.0  2018-06-04  5.0
3   9615.0  2018-06-05  4.7
Terry
  • 2,761
  • 2
  • 14
  • 28