1

computing the Time Difference based on Two rows, picked upon a condition. First Row is the Current Row, and Second Row is picked upon a condition. Condition can be Value==1 for example.

Illustratively

The Data Frame has Time Stamp and Event Occurred or Not.

enter image description here

Event is [ Master / B / C].

Output DataFrame should Compute, Time Left for Next Event . Eg: Master

Event took place at 12:33. So at 12:30 the Time Left for the Event=3 ie [ 12:33 - 12:30 ] in Minutes. We COuld fill NA if the last next Event is not within the Data Frame time window.

Sample Output

user2458922
  • 1,691
  • 1
  • 17
  • 37

1 Answers1

1

Load a test DataFrame:

df = pd.DataFrame({'year': [2019] * 5,
                   'month': [8] * 5,
                   'day': [16] * 5,
                   'hour': [12, 12, 12, 12, 13],
                   'minute': [1, 2, 3, 4, 5]})

df = pd.DataFrame(pd.to_datetime(df), columns=['Time_Stamp'])
df['Event_Master'] = [0, 0, 1, 0, 1]

Loaded DataFrame looks like:

           Time_Stamp  Event_Master
0 2019-08-16 12:01:00             0
1 2019-08-16 12:02:00             0
2 2019-08-16 12:03:00             1
3 2019-08-16 12:04:00             0
4 2019-08-16 13:05:00             1

To solve the problem, I first add a temporary column called 'Next_Timestamp' which at first just grabs the timestamp when given event is a 1. It is pd.NaT otherwise. We can then use the fillna method to backfill the pd.NaT values with the time of the next event for each row. It now contains the time of the next event for each row. Finally, we just subtract the 'Time_Stamp' column from the 'Next_Timestamp' column.

df['Next_Timestamp'] = df[df.Event_Master == 1].Time_Stamp
df['Next_Timestamp'].fillna(method='backfill', inplace=True)

df['TimeDiff'] = df.Next_Timestamp - df.Time_Stamp

The DataFrame now looks like:

           Time_Stamp  Event_Master      Next_Timestamp TimeDiff
0 2019-08-16 12:01:00             0 2019-08-16 12:03:00 00:02:00
1 2019-08-16 12:02:00             0 2019-08-16 12:03:00 00:01:00
2 2019-08-16 12:03:00             1 2019-08-16 12:03:00 00:00:00
3 2019-08-16 12:04:00             0 2019-08-16 13:05:00 01:01:00
4 2019-08-16 13:05:00             1 2019-08-16 13:05:00 00:00:00

Finally, drop the temporary 'Next_Timestamp' column:

df.drop(['Next_Timestamp'], axis=1, inplace=True)

And the final DataFrame looks like:

           Time_Stamp  Event_Master TimeDiff
0 2019-08-16 12:01:00             0 00:02:00
1 2019-08-16 12:02:00             0 00:01:00
2 2019-08-16 12:03:00             1 00:00:00
3 2019-08-16 12:04:00             0 01:01:00
4 2019-08-16 13:05:00             1 00:00:00

Repeat for other columns as needed. Hope this helps!

Jon Strutz
  • 302
  • 4
  • 9
  • the output comes into a DataType of timeDiff. Can we get that into minutes. ? – user2458922 Aug 28 '19 at 19:06
  • You could do something like `df.TimeDiff.astype('timedelta64[m]')` where the m in brackets means minutes. ([link](https://stackoverflow.com/questions/18215317/extracting-days-from-a-numpy-timedelta64-value)) – Jon Strutz Aug 29 '19 at 19:05