I'm trying to use rolling window to get the time difference between rows based on condition
my dataset is like
Time Type ConditionA default index
00:00 A True 0
00:00 A Flase 1
00:00 A True 2
00:01 B True 3
00:01 A True 4
00:01 B True 5
My purpose is to get the time differnce between the same type in a rolling window of 10 seconds
if both ConditionA is true.
eg. the time difference of row 5 will be 0, since the row 5 and row 3 are same type, and both conditionA is true
My final dataset will be like this
Time Type ConditionA default index Time difference
00:00 A True 0 N/A (or -1 )
00:00 A Flase 1 N/A (or -1 )
00:00 A True 2 0s
00:01 B True 3 N/A (or -1 )
00:01 A True 4 1s
00:01 B True 5 0s
I tried the following
df.groupby('Type',sort = False).apply(lambda win: win.rolling('10s').apply(test_func))
def test_func(win):
target_value = win['ConditionA'].values[-1]
if(len(win)>1 ):
qualified_rows = win.loc[win['ConditionA'].values == target_value]
target_row = qualified_rows.iloc[[-2]]
current_row = win.iloc[[-1]]
time_difference = current_row.index - target_row.index
return pd.Series(time_difference ,index= win.iloc[[-1]].index )
else:
return pd.Series(-1,index= win.iloc[[-1]].index )
However, it returns
ValueError: cannot reindex from a duplicate axis
This is because I set Time as index, and time has duplication.
I also tried the following
Access mutiple column in window, like this question, but it only apply to integer window, not time window.
I tried to use multiindex, then rolling on Time index, so that I won't get duplicate. but it doesn't work, it shows
ValueError: window must be an integer
How should I solve this problem ? or to achieve similar result, Thank you!