Pandas DataFrame, computing the Time Difference between one row and other row which satisfies a condition

Question

computing the Time Difference based on Two rows, picked upon a condition. First Row is the Current Row, and Second Row is picked upon a condition. Condition can be Value==1 for example.

Illustratively

The Data Frame has Time Stamp and Event Occurred or Not.

Event is [ Master / B / C].

Output DataFrame should Compute, Time Left for Next Event . Eg: Master

Event took place at 12:33. So at 12:30 the Time Left for the Event=3 ie [ 12:33 - 12:30 ] in Minutes. We COuld fill NA if the last next Event is not within the Data Frame time window.

Jon Strutz · Accepted Answer · 2019-08-18T18:51:32.997

Load a test DataFrame:

df = pd.DataFrame({'year': [2019] * 5,
                   'month': [8] * 5,
                   'day': [16] * 5,
                   'hour': [12, 12, 12, 12, 13],
                   'minute': [1, 2, 3, 4, 5]})

df = pd.DataFrame(pd.to_datetime(df), columns=['Time_Stamp'])
df['Event_Master'] = [0, 0, 1, 0, 1]

Loaded DataFrame looks like:

           Time_Stamp  Event_Master
0 2019-08-16 12:01:00             0
1 2019-08-16 12:02:00             0
2 2019-08-16 12:03:00             1
3 2019-08-16 12:04:00             0
4 2019-08-16 13:05:00             1

To solve the problem, I first add a temporary column called 'Next_Timestamp' which at first just grabs the timestamp when given event is a 1. It is pd.NaT otherwise. We can then use the fillna method to backfill the pd.NaT values with the time of the next event for each row. It now contains the time of the next event for each row. Finally, we just subtract the 'Time_Stamp' column from the 'Next_Timestamp' column.

df['Next_Timestamp'] = df[df.Event_Master == 1].Time_Stamp
df['Next_Timestamp'].fillna(method='backfill', inplace=True)

df['TimeDiff'] = df.Next_Timestamp - df.Time_Stamp

The DataFrame now looks like:

           Time_Stamp  Event_Master      Next_Timestamp TimeDiff
0 2019-08-16 12:01:00             0 2019-08-16 12:03:00 00:02:00
1 2019-08-16 12:02:00             0 2019-08-16 12:03:00 00:01:00
2 2019-08-16 12:03:00             1 2019-08-16 12:03:00 00:00:00
3 2019-08-16 12:04:00             0 2019-08-16 13:05:00 01:01:00
4 2019-08-16 13:05:00             1 2019-08-16 13:05:00 00:00:00

Finally, drop the temporary 'Next_Timestamp' column:

df.drop(['Next_Timestamp'], axis=1, inplace=True)

And the final DataFrame looks like:

           Time_Stamp  Event_Master TimeDiff
0 2019-08-16 12:01:00             0 00:02:00
1 2019-08-16 12:02:00             0 00:01:00
2 2019-08-16 12:03:00             1 00:00:00
3 2019-08-16 12:04:00             0 01:01:00
4 2019-08-16 13:05:00             1 00:00:00

Repeat for other columns as needed. Hope this helps!

the output comes into a DataType of timeDiff. Can we get that into minutes. ? — user2458922, Aug 28 '19 at 19:06
You could do something like `df.TimeDiff.astype('timedelta64[m]')` where the m in brackets means minutes. ([link](https://stackoverflow.com/questions/18215317/extracting-days-from-a-numpy-timedelta64-value)) — Jon Strutz, Aug 29 '19 at 19:05

Pandas DataFrame, computing the Time Difference between one row and other row which satisfies a condition

1 Answers1

Linked