3

I have a dataframe as follows:

    imei    event_type               time   
    1107    alarm                    2020-01-28 11:32:42+00:00  
    1107    alarm_restored           2020-01-28 11:32:53+00:00
    1107    alarm_emergency          2020-01-28 11:33:03+00:00  
    1107    alarm_emergency_restored 2020-01-28 11:33:06+00:00
    1108    alarm                    2020-01-28 11:42:42+00:00  
    1108    alarm_restored           2020-01-28 11:43:53+00:00
    1109    alarm_emergency          2020-01-28 11:53:23+00:00  
    1109    alarm_emergency          2020-01-28 11:53:23+00:00  
    1109    alarm_emergency_restored 2020-01-28 11:57:06+00:00  
    1110    alarm_emergency          2020-01-29 10:23:05+00:00  
    1111    alarm_restored           2020-01-29 11:10:53+00:00  
    1112    alarm_emergency_restored 2020-01-29 12:13:23+00:00  

I want to find the time difference between alarm and restored type events for every user. I have no idea how to proceed with it. I tried calculate the time difference between two consecutive rows in pandas I tried

    df_alarm['time'].diff(3)

and got :

    0                              NaT
    1                              NaT
    2                              NaT
    3           0 days 00:00:23.706000
    4           0 days 00:27:28.364000
    ...

Which is not how I expected the results. I want results in minutes/seconds

UPDATE:

I want to find time difference in every consecutive alarm and alarm_restored, alarm_emergency and alarm_emergency_restored only if they are consecutive rows. All other rows should be NaT.

Expected Output:

    imei    event_type               time                       time_diff
    1107    alarm                    2020-01-28 11:32:42+00:00  NaT
    1107    alarm_restored           2020-01-28 11:32:53+00:00  00:00:11
    1107    alarm_emergency          2020-01-28 11:33:03+00:00  NaT
    1107    alarm_emergency_restored 2020-01-28 11:33:06+00:00  00:00:03
    1108    alarm                    2020-01-28 11:42:42+00:00  NaT
    1108    alarm_restored           2020-01-28 11:43:53+00:00  00:01:11
    1109    alarm_emergency          2020-01-28 11:14:27+00:00  NaT
    1109    alarm_emergency          2020-01-28 11:53:23+00:00  NaT
    1109    alarm_emergency_restored 2020-01-28 11:57:06+00:00  00:03:43
    1110    alarm_emergency          2020-01-29 10:23:05+00:00  NaT
    1111    alarm_restored           2020-01-29 11:10:53+00:00  NaT
    1112    alarm_emergency_restored 2020-01-29 12:13:23+00:00  NaT

As you see, if there are two consecutive alarm_* events and one restoral after that(as in rows 1109-1109), I want to find difference only between row 2 and row 3 for 1109.

Rasika
  • 387
  • 6
  • 19

1 Answers1

1

This will work.

First you get the time difference between all the rows.

df["timediff"] = df.groupby(df.imei)["time"].diff()

Then, you just set timediff to NaT (not a time) for all rows that are not "*_restored" because you don't care about the time from "*_restored" to any other alarm events:

import numpy as np
df["timediff"][df.event_type.str.contains("restored") == False] = np.datetime64('NaT')

This gives exactly what you want.

TYZ
  • 8,466
  • 5
  • 29
  • 60
  • other rows are not irrelevant. I need difference between every type of alarm, for every user. – Rasika Mar 10 '20 at 20:26
  • 1
    @Rasika You said "I want to find the time difference between alarm and restored type events". So you want time difference between alarm and each of `alarm_*`? You should provide your expected output from the example given. – TYZ Mar 10 '20 at 20:30
  • Yes. My bad. I should have been more clear. But thanks for your solution. I am trying it. – Rasika Mar 10 '20 at 21:19
  • I have updated my expected output. Your solution is not what I was expected. I was not very clear in my question. Hence updated it with expected output now. – Rasika Mar 10 '20 at 22:08
  • This wont work for me Have added edits. Please have a look – Rasika Mar 17 '20 at 18:37
  • @Rasika Does `imei` serve as group? In imei 1110, you are calculating time difference between two differen imeis.\ – TYZ Mar 17 '20 at 18:51
  • If that's the case, I don't see why mine is not working. It's working for me with the data you provided. I'm getting exactly the same as your expected. – TYZ Mar 17 '20 at 20:45
  • @Rasika Only thing might be affecting could be I had imei as index, so I was grouping by `df.index`. – TYZ Mar 17 '20 at 20:49
  • This works best after changing to groupby imei. Thanks! – Rasika Mar 18 '20 at 19:01