3

I have the following pandas df, with date as index:

              S&P500    Europe     Japan
Date                                    
2002-12-23  0.247683  0.245252  0.203916
2002-12-24  0.241855  0.237858  0.200971
2002-12-26  0.237095  0.230614  0.197621
2002-12-27  0.241104  0.250323  0.191855

I need to add new rows for each missing date (considering the dates between the first and the last dateof the df). For new rows, values in columns should be forward filled. This is the expected output (2002-12-25 being added):

              S&P500    Europe     Japan
Date                                    
2002-12-23  0.247683  0.245252  0.203916
2002-12-24  0.241855  0.237858  0.200971
2002-12-25  0.241855  0.237858  0.200971
2002-12-26  0.237095  0.230614  0.197621
2002-12-27  0.241104  0.250323  0.191855

I created a list of all the dates between the first and the last date:

min_date=df.index.min()
max_date=df.index.max()
date_list=pd.date_range(min_date,max_date-timedelta(days=1),freq='d')

Is there a way to check what dates of "date_list" are not in the df index and add rows accordingly? The columns of new rows should be filled with NaN, so that I can later forward fill them. Thanks

younggotti
  • 762
  • 2
  • 15

1 Answers1

2

You can use .reindex + .ffill():

min_date = df.index.min()
max_date = df.index.max()
date_list = pd.date_range(min_date, max_date, freq="D")

df = df.reindex(date_list).ffill()
print(df)

Prints:

              S&P500    Europe     Japan
2002-12-23  0.247683  0.245252  0.203916
2002-12-24  0.241855  0.237858  0.200971
2002-12-25  0.241855  0.237858  0.200971
2002-12-26  0.237095  0.230614  0.197621
2002-12-27  0.241104  0.250323  0.191855

OR: Use method= parameter

df = df.reindex(date_list, method="ffill")
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91