I would like to group events that happened close together into a parent event.
An example is this:
import pandas as pd
df = pd.DataFrame(
[
['2020-01-01 10:00', '1'],
['2020-01-01 10:01', '2'],
['2020-01-01 10:02', '3a'],
['2020-01-01 10:02', '3b'],
['2020-01-01 10:30', '4'],
['2020-01-01 10:50', '5'],
['2020-01-01 10:54', '6'],
['2020-01-01 10:55', '7'],
], columns=['event_time', 'event_id']
)
df['event_time'] = pd.to_datetime(df['event_time'])
In the above example with a window size of 1 minute what I want is for each event a list of all other events that occurred within +- 1 minute of it.
So something like:
df = pd.DataFrame(
[
['2020-01-01 10:00', '1', ['2']],
['2020-01-01 10:01', '2', ['1','3a','3b']],
['2020-01-01 10:02', '3a', ['2','3b']],
['2020-01-01 10:02', '3b', ['3a', '2'],
['2020-01-01 10:30', '4', None],
['2020-01-01 10:50', '5', None],
['2020-01-01 10:54', '6', ['7']],
['2020-01-01 10:55', '7', ['6']],
], columns=['event_time', 'event_id', 'related_event_id_list']
)
I was almost able to get close with pandas merge_asof:
pd.merge_asof(df,df, on="event_time", tolerance=pd.Timedelta("1m"), allow_exact_matches=False, direction='nearest')
but it seems to just want to merge to one nearest event as opposed to some sort of option or way to merge all events within the tolerance for each row (and obviously just get more rows in my results so its more like an outer join on a range).
Was thinking grouping time-series events like this must be fairly common but could not really find any obvious functions in Pandas to do this or maybe I'm missing something.
As always trying to avoid writing the code myself via loops or anything like that if I can avoid it :)