Calculating overlap betwen date ranges in a pandas dataframe

Question

I'm trying to find the overlapping time in minutes between two date ranges. In order to avoid for loops I chose not to use min/max when determining it. I developed this function to find the total amount of overlapping minutes between any EventA and EventB with multiple overlaps being possible I believe this works but I wanted to reach out to the community to analyze as it's rather unconventional. Are there any situations in which this method fails or reasons I should be wary of using it? Feedback is appreciated (stackoverflow newbie so feel free to let me know if I should format the request differently as well).

import pandas as pd

df_EventA= pd.DataFrame()
df_EventB= pd.DataFrame()

df_EventA['EventAStart'] = pd.Series(pd.to_datetime(['20200101 9:30','20200101 10:30:00', '20200101 11:30:00', '20200101 12:30:00', '20200101 13:30:00', '20200101 14:30:00','20200101 15:30:00','20200101 16:30:00']))
df_EventA['EventAEnd'] = pd.Series(pd.to_datetime(['20200101 10:00','20200101 11:00:00', '20200101 12:00:00', '20200101 13:00:00', '20200101 14:00:00','20200101 15:00:00','20200101 16:00:00','20200101 17:00:00']))
df_EventB['EventBStart'] = pd.Series(pd.to_datetime(['20200101 9:45','20200101 10:45:00', '20200101 11:45:00', '20200101 12:45:00', '20200101 13:45:00', '20200101 14:45:00','20200101 15:45:00','20200101 16:45:00']))
df_EventB['EventBEnd'] = pd.Series(pd.to_datetime(['20200101 10:00','20200101 11:00:00', '20200101 12:00:00', '20200101 13:00:00', '20200101 14:00:00','20200101 15:00:00','20200101 16:00:00','20200101 17:00:00']))

df_EventA['EventATotal'] = (df_EventA['EventAEnd']-df_EventA['EventAStart']).dt.total_seconds() / 60
df_EventB['EventBTotal'] = (df_EventB['EventBEnd']-df_EventB['EventBStart']).dt.total_seconds() / 60
df_EventA['overlap_minutes']=0

def overlap(x, df_EventB):
    total_minutes = x['EventATotal']+df_EventB['EventBTotal']
    start_diff = (df_EventB['EventBStart']-x['EventAStart']).dt.total_seconds() / 60
    end_diff = (df_EventB['EventBEnd']-x['EventAEnd']).dt.total_seconds() / 60
    breaks=(total_minutes-abs(start_diff)-abs(end_diff))/2
    x['overlap_minutes']= breaks[breaks>0].sum()
    return x

df_EventA = df_EventA.apply(overlap,axis=1, args=(df_EventB,))

Hi, and welcome to SO! Could you clarify your _question_ so we can come up with an appropriate answer? As it is, it seems unclear what kind of answer you're hoping for. — Nicolas Gervais, May 18 '20 at 16:00
Hi, @MrFuppes, thanks for the input the linked method fails in this scenario EventA- 9:00-11:00 11:00-12:00 EventB- 10:00-10:30 10:30-11:00 If we adjusted to include timedeltas instead of True/False we would receive 9:00-11:00, 30 minutes instead of 1 hour Please let me know if I'm missing anything — Mat, May 18 '20 at 19:14
My question differs from that one in that I'm seeking to get the number of overlapping minutes as my output so for any one row I have to take multiple possible matches into account — Mat, May 18 '20 at 19:23

Calculating overlap betwen date ranges in a pandas dataframe

0 Answers0