I have a dataframe with two columns containing date ranges.
The original data looks something like this:
df1:
personid, visitto, intime, outtime
8, 8, 2017-07-01 06:00:00, 2017-07-01 08:00:00
17 8, 2017-07-02 09:00:00, 2017-07-02 10:00:00
df2:
location, open, close
8, 2017-07-01 04:00:00, 2017-07-01 13:00:00
Here's what I have done so far: 1. Merge the two dataframes on visitto->location
merged_df:
personid, visitto, intime, outtime, location, open, close
8, 8, 2017-07-01 06:00:00, 2017-07-01 08:00:00, 8, 2017-07-01 04:00:00, 2017-07-01 13:00:00
17 8, 2017-07-02 09:00:00, 2017-07-02 10:00:00, 8, 2017-07-01 04:00:00, 2017-07-01 13:00:00
- Convert the four columns with dates to pandas intervals
personid, visitto, visittime, opentime
8, 8, [2017-07-01 06:00:00, 2017-07-01 08:00:00], [2017-07-01 04:00:00, 2017-07-01 13:00:00]
17 8, [2017-07-02 09:00:00, 2017-07-02 10:00:00], [2017-07-01 04:00:00, 2017-07-01 13:00:00]
I now want to filter this merged dataframe of intervals to only return rows where the datetimes overlap. My expected result would be:
personid, visitto, visittime, opentime
8, 8, [2017-07-01 06:00:00, 2017-07-01 08:00:00], [2017-07-01 04:00:00, 2017-07-01 13:00:00]
I successfully was able to get to the desired outcome using iterrows() and comparing the overlaps line-by-line, however, this is extremely verbose. What I would prefer to do is something like (non-working example):
merged_df.loc[merged_df['visittime'].overlaps(merged_df['opentime'])]
Or, accomplish this during the merge of the dataframes. Something along the lines of (non-working example):
merge_df = pd.merge(df1[["personid", "visitto", "intime", "outtime"]], df2[["location", "open", "close"]], how='inner', left_on='visitto', right_on='location') #WHERE intime, outtime OVERLAPS open, close
Is it possible to perform this operation from a Pandas function rather than looping? This can easily be done from Sql in the database, however, I am struggling to find an efficient way to perform this operation on the dataframe.