Assume there are two Pandas Series
(or DataFrames
) both containing different datetime
values. For example one series/frame containing messages and another one containing specific events. Now I would be interested in filtering out all messages which where posted right after (meaning: within n
-minutes after the event) any event occured. How could I do that using Pandas?
(Besides using two wrapped for
-loops, I am hoping for something more panda-ish and maybe more efficient. Like using groupby
or similar.)
Some sample data could be:
import pandas as pd
messages = pd.DataFrame([
[pd.to_datetime("2000-01-01 09:00:00"), "non-relevant msg 1"],
[pd.to_datetime("2000-01-01 09:02:11"), "non-relevant msg 2"],
[pd.to_datetime("2000-01-01 09:03:30"), "relevant msg 1"],
[pd.to_datetime("2000-01-01 09:04:30"), "relevant msg 2"],
[pd.to_datetime("2000-01-01 09:10:11"), "non-relevant msg 3"],
[pd.to_datetime("2000-01-01 10:00:15"), "relevant again 1"],
[pd.to_datetime("2000-01-01 10:03:15"), "relevant again 2"],
[pd.to_datetime("2000-01-01 10:07:00"), "non-relevant msg 4"],
], columns=["created_at", "text"])
events = pd.Series([
pd.to_datetime("2000-01-01 09:02:59"),
pd.to_datetime("2000-01-01 10:00:00"),
])
n = pd.Timedelta("5min")
Which should give the following output:
output = pd.DataFrame([
[pd.to_datetime("2000-01-01 09:03:30"), "relevant msg 1"],
[pd.to_datetime("2000-01-01 09:04:30"), "relevant msg 2"],
[pd.to_datetime("2000-01-01 10:00:15"), "relevant again 1"],
[pd.to_datetime("2000-01-01 10:03:15"), "relevant again 2"],
], columns=["created_at", "text"])