0

I would like to get the average, and max for a certain time in the future of each row. My dataframe has a datetime and a cost column.

Here's how I'm getting the past:

df.rolling('5d', on='datetime')['cost'].mean()

Works great, but I need to do the same for the future too.

I found this question: pandas rolling window mean in the future

But that is really a workaround since it's shifting rows. (that is what shift does right? or does it shift through time if you're using time?)

My data has irregular times in each row. If I shift a few rows I might get data from weeks in the future instead of a few days (or hours).

Is there a more reliable way that purely works off the date?

I thought this might work, seems intuitive to me:

df.rolling('-3d', on='datetime')['cost'].mean()  # is this what shift(-3) actaully attempts to do?

That didn't work.

I suppose if I get desperate I could make a 'reverse' datetime column and use that, but that's icky.

MetaStack
  • 3,266
  • 4
  • 30
  • 67

1 Answers1

0

I just reversed the time and then I'll use that to get the future.

this is as yet untested, but I'll try it out tomorrow:

rows = df.shape[0] - 1
reverse_time = []
last_date = ''
for ix in df.index:
    if ix == 0:
        last_date = df['datetime'][rows]
        reverse_time.append(last_date)
    else:
        reverse_time.append(last_date - (last_date - df['datetime'][rows - ix]))
        last_date = df['datetime'][rows - ix]
df.loc[:, 'future'] = reverse_time
df.loc[:, 'days_3_outcome'] = df.loc[:, :].rolling(f'3d', on='future')['cost'].mean()

I wish there was a better way.

MetaStack
  • 3,266
  • 4
  • 30
  • 67