0

My data format is the following:

    id  eventSource        eventType                           timestamp
0  102    sensor1          received                   2021-10-06 13:45:51
1  102    sensor1          startingCalculation        2021-10-06 13:45:55
2  102    sensor1          endedCalculation           2021-10-06 13:46:00
3  103    sensor2          received                   2021-10-06 13:46:00
4  103    sensor2          startingCalculation        2021-10-06 13:46:04
5  103    sensor2          endedCalculation           2021-10-06 13:46:10

I need help with the following task:

I want to group by the Id eventSource and eventType and show diffs between the received and endedCalculation events for each of them like so:

    id   eventSource         diff 
0  102      sensor1           9
0  103      sensor2           10

And so on ...

I am kind of stuck in the groupby stage,

Thank you for anyone trying to help

Eitank
  • 570
  • 8
  • 21

1 Answers1

1

try:

m=df['eventType'].isin(['received','endedCalculation'])
#created a boolean mask

Finally pass that mask and use groupby() and agg() method:

out=df[m]
out['diff']=out.groupby(['id','eventSource'])['timestamp'].diff().dt.total_seconds()
out=out.dropna(subset=['diff']).drop('eventType',1).reset_index(drop=True)

output of out:

    id  eventSource     timestamp               diff
0   102     sensor1     2021-10-06 13:46:00     9.0
1   103     sensor2     2021-10-06 13:46:10     10.0
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41