I got a list of different events with mixed occurrences. For instance the event1 might occur three times, then another event and later on event1 will occur again.
What I need is the intervals for each event and the number of occurrences of that event in those intervals.
values = {
'2017-11-28 11:00': 'event1',
'2017-11-28 11:01': 'event1',
'2017-11-28 11:02': 'event1',
'2017-11-28 11:03': 'event2',
'2017-11-28 11:04': 'event2',
'2017-11-28 11:05': 'event1',
'2017-11-28 11:06': 'event1',
'2017-11-28 11:07': 'event1',
'2017-11-28 11:08': 'event3',
'2017-11-28 11:09': 'event3',
'2017-11-28 11:10': 'event2',
}
import pandas as pd
df = pd.DataFrame.from_dict(values, orient='index').reset_index()
df.columns = ['time', 'event']
df['time'] = df['time'].apply(pd.to_datetime)
df.set_index('time', inplace=True)
df.sort_index(inplace=True)
df.head()
The expected result is:
occurrences = [
{'start':'2017-11-28 11:00',
'end':'2017-11-28 11:02',
'event':'event1',
'count':3},
{'start':'2017-11-28 11:03',
'end':'2017-11-28 11:04',
'event':'event2',
'count':2},
{'start':'2017-11-28 11:05',
'end':'2017-11-28 11:07',
'event':'event1',
'count':3},
{'start':'2017-11-28 11:08',
'end':'2017-11-28 11:09',
'event':'event3',
'count':2},
{'start':'2017-11-28 11:10',
'end':'2017-11-28 11:10',
'event':'event2',
'count':1},
]
I was thinking to use pd.merge_asof to find the start/end times of the intervals and the use pd.cut (as explained here) for groupby and count. But somehow I'm stuck. Any help is appreciated.