I am trying to modify the overlapping time period problem so that if there is 1 day difference between dates, it should still be counted as an overlap. As long as the difference in dates is less than 2 days it should be seen as an overlap.
This is the dataframe containing the dates
df_dates = pd.DataFrame({"id": [102, 102, 102, 102, 103, 103, 104, 104, 104, 102, 104, 104, 103, 106, 106, 106],
"start dates": [pd.Timestamp(2002, 1, 1), pd.Timestamp(2002, 3, 3), pd.Timestamp(2002,10,20), pd.Timestamp(2003, 4, 4), pd.Timestamp(2003, 8, 9), pd.Timestamp(2005, 2, 8), pd.Timestamp(1993, 1, 1), pd.Timestamp(2005, 2, 3), pd.Timestamp(2005, 2, 16), pd.Timestamp(2002, 11, 16), pd.Timestamp(2005, 2, 23), pd.Timestamp(2005, 10, 11), pd.Timestamp(2015, 2, 9), pd.Timestamp(2011, 11, 24), pd.Timestamp(2011, 11, 24), pd.Timestamp(2011, 12, 21)],
"end dates": [pd.Timestamp(2002, 1, 3), pd.Timestamp(2002, 12, 3),pd.Timestamp(2002,11,20), pd.Timestamp(2003, 4, 4), pd.Timestamp(2004, 11, 1), pd.Timestamp(2015, 2, 8), pd.Timestamp(2005, 2, 3), pd.Timestamp(2005, 2, 15) , pd.Timestamp(2005, 2, 21), pd.Timestamp(2003, 2, 16), pd.Timestamp(2005, 10, 8), pd.Timestamp(2005, 10, 21), pd.Timestamp(2015, 2, 17), pd.Timestamp(2011, 12, 31), pd.Timestamp(2011, 11, 25), pd.Timestamp(2011, 12, 22)]
})
This was helpful with answering the overlap question but I am not sure how to modify it (red circle) to include 1 day difference
This was my attempt at answering the question, which kind of did (red circle), but then the overlap calculation is not always right (yellow circle)
def Dates_Restructure(df, pers_id, start_dates, end_dates):
df.sort_values([pers_id, start_dates], inplace=True)
df['overlap'] = (df.groupby(pers_id)
.apply(lambda x: (x[end_dates].shift() - x[start_dates]) < timedelta(days=-1))
.reset_index(level=0, drop=True))
df['cumsum'] = df.groupby(pers_id)['overlap'].cumsum()
return df.groupby([pers_id, 'cumsum']).aggregate({start_dates: min, end_dates: max}).reset_index()
I will appreciate your help with this. Thanks