How to get the duration inside the rolling window from he DatetimeIndex in Pandas

Question

I am trying to calculate time duration inside of each sliding window for this data:

                                ID  
    DATE            
    2017-05-17 15:49:51         2   
    2017-05-17 15:49:52         5   
    2017-05-17 15:49:55         2   
    2017-05-17 15:49:56         3   
    2017-05-17 15:49:58         5
    2017-05-17 15:49:59         5

In this example DATE is the index, and I am trying to get the duration inside rolling window of size 3 which overlap each other. Answer should be like this:

                                ID      duration    
    DATE            
    2017-05-17 15:49:51         2        4  
    2017-05-17 15:49:52         5        4  
    2017-05-17 15:49:55         2        3  
    2017-05-17 15:49:56         3        3  
    2017-05-17 15:49:58         5        NaN
    2017-05-17 15:49:59         5        NaN

I tried:

df['duration'] = df.rolling(window=3).apply(df.index.max()-df.index.min())

But I got this error:

TypeError: 'DatetimeIndex' object is not callable

try `df['duration'] = df.rolling(window=3).apply(lambda x: x.index.max()-x.index.min())` — jezrael, Sep 11 '17 at 08:46
I did that before, I got this error `AttributeError: 'numpy.ndarray' object has no attribute 'index'` — Ali, Sep 11 '17 at 08:49
Related: https://stackoverflow.com/questions/37486502/why-does-pandas-rolling-use-single-dimension-ndarray — IanS, Sep 11 '17 at 08:53
I also try this `df['duration'] = df.rolling(5).apply(lambda x: pd.to_datetime(x.index.max()) - pd.to_datetime(x.index.min()))` Got the same error `AttributeError: 'numpy.ndarray' object has no attribute 'index'` — Ali, Sep 11 '17 at 08:53
As the linked question explains, `rolling` works on a numpy array, not a dataframe, so you do not have access to all the pandas functionality inside. You have to find a workaround based on array-indexing. — IanS, Sep 11 '17 at 08:57
I tried the `df['duration'] = df.rolling(5).apply(lambda x: pd.Series(x.index.max()) - pd.Series(x.index.min()))`, I got this error: `AttributeError: 'numpy.ndarray' object has no attribute 'index'` — Ali, Sep 11 '17 at 08:59

5nv · Accepted Answer · 2017-09-11T09:19:41.977

4

df.reset_index(inplace=True)    
df['PREVIOUS_TIME']= df.DATE.shift(-2)
df['duration']=(df.PREVIOUS_TIME-df.DATE)/np.timedelta64(1,'s')
df.drop('PREVIOUS_TIME',axis=1,inplace=True)
df.set_index('DATE',inplace=True)

Assuming that 'DATE' is a datetime.

edited Sep 11 '17 at 09:19

answered Sep 11 '17 at 09:02

5nv

441
2
15

`DATE` is the index so I can't call `df.DATE.shift(-3)` – Ali Sep 11 '17 at 09:05
df.DATE.reset_index(inplace=True); Afterwards df.set_index('DATE',inplace=True) – 5nv Sep 11 '17 at 09:06
It doesn't give me the answer I'm looking for. The time sliding windows overlap each other if you look at my example: `In Window-1: 15:49:55 - 15:49:51 = 4` `In window-2: 15:49:56 - 15:49:52 = 4` `In window-3: 15:49:58 - 15:49:55 = 3` and so on. – Ali Sep 11 '17 at 09:13
Ah, ok, then you should do shift -2 instead of shift -3 – 5nv Sep 11 '17 at 09:19
Thanks so much, I spent too much time to figure this up – Ali Sep 11 '17 at 09:26

Fleaurent · Answer 2 · 2020-07-24T15:17:09.723

def timediff(time_window: pd.Series) -> float:
    duration = time_window.index.max() - time_window.index.min() 
    return duration.total_seconds()

df['duration'] = np.nan
df['duration'] = df.duration.rolling(window=3).apply(func=timediff, raw=False)

I've just stumbled across this question and wanted to provide a solution using the rolling window approach:
with raw=False (default) you provide a Series to the function, so you can use index.max() - index.min() or index[-1] - index[0]
The only problem is that you need to return a number and not a timedelta object.

How to get the duration inside the rolling window from he DatetimeIndex in Pandas

2 Answers2