1

I already asked a related question filling gaps in time series
Fill Gaps in time series pandas dataframe
and Akshay Sehgal was kind enough to give a good a detailed answer!

However I found another problem with my data.

Tha following code now works fine in filling the gaps as long as there a time stamps for the beginning and ending of a trading day.
For example I want to fill all gaps in the timeseries between 09:30 and 16:00. As long as there is a timestamp in the data starting at 09:30 and ending at 16:00 all the gaps within this time are filled by resample().
However if the data for the current day starts at 9:45 the resample function will start filling the gaps from this time onwards.
But it will not generate new timestamps from 09:30 to 09:40 (If we consider a 5 Minute intervall)

This is the code I currently use:

# create new col FillDate from the timestamp (we need this to group the data (otherwise resample would also create new dats and not only times))
df_process['FillDate'] = df_process['Exchange DateTime'].dt.date
# set timestamp as index
df_process.set_index('Exchange DateTime', inplace=True)

# group by for each date, resample missing timestamps and forward fill values
df_process = df_process.groupby('FillDate').resample(rule=update_interval).ffill()

# reset the index and delete the colume Fill Date
df_process_out = df_process.reset_index('FillDate', drop=True).drop('FillDate',1)

However I would like to resample always in the fixed time intervall 09:30 to 16:00 regardless if there is a timestamp available at 09:30 or 16:00.

Any ideas how I can solve this in an efficient way?

Any help/guidance would be highly appreciated Thanks

Chris Bauer
  • 63
  • 1
  • 6

1 Answers1

1

I case anyone is interested I think I found a solution:

   # group the time sires by dates (using the FillDate Column) and than apply 
   # the "Reindex_by_Date" Function to generate the index for each date in the
   # given time frame and fill missing tim stamps 
   df_process = df_process.groupby('FillDate').apply(reindex_by_date, intervall=update_interval)

   #drop the helper index "FillDate"
   df_process = df_process.reset_index('FillDate', drop=True).drop('FillDate',1)

   # since we reindexed by each date only it can happen that if there is a value missing
   # on the boarder of the index (e.g. last or fist entry) it might have NaN as value
   # we fix this here (we forward fill for example taking the last value from the previous day)
   df_process_out = df_process.fillna(method='ffill')


   # Helper Function for Closing data gaps with Pandas Groupby and resample
   def reindex_by_date(df, intervall):
        start_range = df.index.date.min().strftime('%Y-%m-%d') +" 09:30:00"
        end_range = df.index.date.max().strftime('%Y-%m-%d') +" 16:00:00"
        dates = pd.date_range(start_range, end_range, freq=intervall)
        return df.reindex(dates).ffill()here

Comments are very welcome or if someone has a more efficient solution I would be very interested. Thanks

Chris Bauer
  • 63
  • 1
  • 6