2

Trying to figure out a formula to see whether two dates are overlapping for the same organization. I figure there is a way to do with

groupby('org_id').apply(lambda x: max(x.end_date1) > min(x.start_date2) & max(x.end_date2) > min(x.start_date1))

but preferably this would iterate through all dates (in the event that three dates overlap for one org).

Is there an easy way to do that?

This would be the intended outcome (in table form):

desired output

tdy
  • 36,675
  • 19
  • 86
  • 83
sam-remler
  • 21
  • 1
  • Non-overlapping is defined by (End of B < Start of A) OR (End of A is < Start of B). – Scott Boston Jun 22 '21 at 19:46
  • This is an interesting problem, particularly when interested in vectorization, but unfortunately I don't have time to get into this so I can only [link to this answer for some reference](https://stackoverflow.com/a/325964/14170672) – Stryder Jun 22 '21 at 20:33
  • What does your input data look like? Your apply function implies that each row contains more than one date - unlike your intended outcome table. Plus, it look like you have a requirement for more than just 2 dates per org. True? – jch Jun 23 '21 at 03:26
  • And in the case where at least two dates overlap then would all of the dates in an org be flagged as overlapping? Or just the dates that overlap? – jch Jun 23 '21 at 05:14

1 Answers1

0

In this answer I assume that you want to mark the entirety of an Org_ID as True if there is any overlap in dates. This approach handles situations where there are more than two date ranges to be compares.

The idea is to just start throwing the individual days for a given date range into a bucket. If one of the days that you are about to throw in the bucket is already in the bucket then you have an overlap. In which case the whole Org_ID is marked as True (has overlap).

d = [
    {'Org_ID': 'A', 'Start_Date': '6/1/2020', 'End_Date': '5/31/2022'},
    {'Org_ID': 'A', 'Start_Date': '12/1/2020', 'End_Date': '11/30/2021'},
    {'Org_ID': 'B', 'Start_Date': '6/1/2020', 'End_Date': '5/31/2021'},
    {'Org_ID': 'B', 'Start_Date': '6/1/2021', 'End_Date': '6/1/2022'},
    {'Org_ID': 'C', 'Start_Date': '1/1/2020', 'End_Date': '3/31/2020'},
    {'Org_ID': 'C', 'Start_Date': '4/1/2020', 'End_Date': '5/31/2020'},
    {'Org_ID': 'C', 'Start_Date': '6/1/2020', 'End_Date': '7/31/2020'},
    {'Org_ID': 'C', 'Start_Date': '8/1/2020', 'End_Date': '10/31/2020'},
    {'Org_ID': 'C', 'Start_Date': '11/1/2020', 'End_Date': '12/31/2021'},
    {'Org_ID': 'D', 'Start_Date': '1/1/2020', 'End_Date': '2/28/2020'},
    {'Org_ID': 'D', 'Start_Date': '3/1/2020', 'End_Date': '3/31/2020'},
    {'Org_ID': 'D', 'Start_Date': '4/1/2020', 'End_Date': '8/31/2020'},
    {'Org_ID': 'D', 'Start_Date': '8/1/2020', 'End_Date': '10/31/2020'},
    {'Org_ID': 'D', 'Start_Date': '11/1/2020', 'End_Date': '12/31/2021'},
]

df = pd.DataFrame(d)

df['Range'] = df.apply(lambda x: pd.date_range(start=x['Start_Date'], end=x['End_Date']), axis=1)

def determine_overlap(org_rng_group):
    bucket = pd.DatetimeIndex(['1/1/1900'])
    
    for x in org_rng_group:
        if x.isin(bucket).any():
            return [True]*len(org_rng_group)
        
        bucket = bucket.append(x)
        
    return [False]*len(org_rng_group)

df['Overlap'] = df.groupby('Org_ID')['Range'].transform(determine_overlap)
df.drop('Range', axis=1)

   Org_ID Start_Date    End_Date  Overlap
0       A   6/1/2020   5/31/2022     True
1       A  12/1/2020  11/30/2021     True
2       B   6/1/2020   5/31/2021    False
3       B   6/1/2021    6/1/2022    False
4       C   1/1/2020   3/31/2020    False
5       C   4/1/2020   5/31/2020    False
6       C   6/1/2020   7/31/2020    False
7       C   8/1/2020  10/31/2020    False
8       C  11/1/2020  12/31/2021    False
9       D   1/1/2020   2/28/2020     True
10      D   3/1/2020   3/31/2020     True
11      D   4/1/2020   8/31/2020     True
12      D   8/1/2020  10/31/2020     True
13      D  11/1/2020  12/31/2021     True

jch
  • 3,600
  • 1
  • 15
  • 17