3

I have multivariate time-series/panel data in the following simplified format:

id,date,event_ind
1,2014-01-01,0
1,2014-01-02,1
1,2014-01-03,1
2,2014-01-01,1
2,2014-01-02,1
2,2014-01-03,1
3,2014-01-01,0
3,2014-01-02,0
3,2014-01-03,1

For this simplified example, I would like the future 2 day sum of event_ind grouped by id

For some reason adapting this example still gives me the "index is not monotonic error": how to do forward rolling sum in pandas?

Here is my approach which otherwise worked for past rolling by group before I adapted it:

df.sort_values(['id','date'], ascending=[True,True], inplace=True)
df.reset_index(drop=True, inplace=True)

df['date'] = pd.DatetimeIndex(df['date'])
df.set_index(['date'], drop=True, inplace=True)

rolling_forward_2_day = lambda x: x.iloc[::-1].rolling('2D').sum().shift(1).iloc[::-1]
df['future_2_day_total'] = df.groupby(['id'], sort=False)['event_ind'].transform(rolling_forward_2_day)
df.reset_index(drop=False, inplace=True)

Here is the expected result:

   id        date  event_ind  future_2_day_total
0   1  2014-01-01          0                   2
1   1  2014-01-02          1                   1
2   1  2014-01-03          1                   0
3   2  2014-01-01          1                   2
4   2  2014-01-02          1                   1
5   2  2014-01-03          1                   0
6   3  2014-01-01          0                   1
7   3  2014-01-02          0                   1
8   3  2014-01-03          1                   0

Any tips on what I might be doing wrong or high-performance alternatives would be great!

EDIT:

One quick clarification. This example is simplified and valid solutions need to be able to handle unevenly spaced/irregular time series which is why rolling with a time-based index is utilized.

Pylander
  • 1,531
  • 1
  • 17
  • 36

1 Answers1

1

You can still use rolling here, but use it with the flag win_type='boxcar' and shift your data around before and after you sum:

df['future_day_2_total'] = (
    df.groupby('id').event_ind.shift(-1)
    .fillna(0).groupby(df.id).rolling(2, win_type='boxcar')
    .sum().shift(-1).fillna(0)
)

   id        date  event_ind  future_day_2_total
0   1  2014-01-01          0                 2.0
1   1  2014-01-02          1                 1.0
2   1  2014-01-03          1                 0.0
3   2  2014-01-01          1                 2.0
4   2  2014-01-02          1                 1.0
5   2  2014-01-03          1                 0.0
6   3  2014-01-01          0                 1.0
7   3  2014-01-02          0                 1.0
8   3  2014-01-03          1                 0.0
user3483203
  • 50,081
  • 9
  • 65
  • 94
  • "boxcar" Excellent. Do you have a good resource that explains exactly what is the boxcar win-type is and maybe the other win_types? +1 – Scott Boston Jul 31 '18 at 17:52
  • Thanks for your contribution! Unfortunately, this clever use of shift() will only work for this simplified example, but not for my actual task. In my actual task, I am rolling on a 365 day in the future basis. – Pylander Jul 31 '18 at 17:52
  • @Pylander I'd have to test it, but this should still work, you will just have to change how much you shift at the end. – user3483203 Jul 31 '18 at 17:54
  • @ScottBoston I only knew about boxcar from when I've used it with `SciPy`. The pandas documentation on the different window types is severely lacking. I might consider submitting a pull request to the documenation if I have time to put some examples together. – user3483203 Jul 31 '18 at 17:56
  • @ user3483203 I think the issue is that in my real case that these are actually not clean and complete time series, so I cannot use shift and need to use datetime index. – Pylander Jul 31 '18 at 17:58
  • Ah, If the dates are sparse that would definitely be an issue :S – user3483203 Jul 31 '18 at 17:59
  • @Pylander In recent versions of Pandas, if you set `date` as the index, you can use frequency specifiers like `365d` or `Y` (for "year"). See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html – shadowtalker Sep 17 '20 at 17:08