0

I'm trying to use rolling window to get the time difference between rows based on condition

my dataset is like

 Time   Type    ConditionA    default index 
 00:00     A        True          0
 00:00     A        Flase         1
 00:00     A        True          2 
 00:01     B        True          3
 00:01     A        True          4
 00:01     B        True          5

My purpose is to get the time differnce between the same type in a rolling window of 10 seconds

if both ConditionA is true.

eg. the time difference of row 5 will be 0, since the row 5 and row 3 are same type, and both conditionA is true

My final dataset will be like this

 Time   Type    ConditionA    default index  Time difference 
 00:00     A        True          0             N/A (or -1 )
 00:00     A        Flase         1             N/A (or -1 )
 00:00     A        True          2             0s
 00:01     B        True          3             N/A (or -1 )
 00:01     A        True          4             1s
 00:01     B        True          5             0s

I tried the following

df.groupby('Type',sort = False).apply(lambda win: win.rolling('10s').apply(test_func))

def test_func(win):

    target_value = win['ConditionA'].values[-1]

    if(len(win)>1 ):

        qualified_rows = win.loc[win['ConditionA'].values == target_value]
        target_row = qualified_rows.iloc[[-2]]
        current_row = win.iloc[[-1]]


        time_difference = current_row.index - target_row.index


        return pd.Series(time_difference ,index= win.iloc[[-1]].index )

    else:

        return pd.Series(-1,index= win.iloc[[-1]].index )

However, it returns

ValueError: cannot reindex from a duplicate axis

This is because I set Time as index, and time has duplication.

I also tried the following

  1. Access mutiple column in window, like this question, but it only apply to integer window, not time window.

  2. I tried to use multiindex, then rolling on Time index, so that I won't get duplicate. but it doesn't work, it shows

    ValueError: window must be an integer
    

How should I solve this problem ? or to achieve similar result, Thank you!

whtitefall
  • 631
  • 9
  • 16

1 Answers1

0

My purpose is to get the time difference between the same type in a rolling window of 10 seconds

If I understood this correctly, you want to do time difference between current and previous row within the same group only if those times are less than 10 seconds apart ?

In that case you could do something like this:

import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame({'Time':[datetime.now(),datetime.now(),datetime.now(),datetime.now(),datetime.now(),datetime.now()],
                   'Type':['A','A','A','B','A','B'],
                   'ConditionA':[True,False,True,True,True,True]})
df['Time shift'] = pd.Series(pd.NaT,index=df.index)
df['Time diff'] = pd.Series(pd.NaT,index=df.index)


for name, group in df.groupby(['Type','ConditionA']):
    df.loc[group.index,'Time shift'] = group['Time'].shift(periods=1) # previous time for each group

The result is something like your example DataFrame :

                  Time           Type   ConditionA  Time shift Time diff
0   2020-03-07 22:38:47.710763     A      True        NaT          NaT
1   2020-03-07 22:38:47.710768     A      False       NaT          NaT
2   2020-03-07 22:38:47.710769     A      True        2020-03-07 22:38:47.710763008     NaT
3   2020-03-07 22:38:47.710769     B      True        NaT          NaT
4   2020-03-07 22:38:47.710770     A      True        2020-03-07 22:38:47.710768896     NaT
5   2020-03-07 22:38:47.710771     B      True        2020-03-07 22:38:47.710768896     NaT

Then for the function, again with groupby:

for name, group in df.groupby(['Type','ConditionA']):
  if name[1]: # If CondiditionA is True
    mask = group[(group['Time'] - group['Time shift']) < timedelta(seconds=10)].index #Row within 10s of their previous one
    df.loc[mask,'Time diff'] = df.loc[mask,'Time'] - df.loc[mask,'Time shift']

Is this the kind of things you were looking for ?

Bruce Swain
  • 583
  • 3
  • 10