Modifying overlapping time period to include 1 day difference

Question

I am trying to modify the overlapping time period problem so that if there is 1 day difference between dates, it should still be counted as an overlap. As long as the difference in dates is less than 2 days it should be seen as an overlap.

This is the dataframe containing the dates

df_dates = pd.DataFrame({"id": [102, 102, 102, 102, 103, 103, 104, 104, 104, 102, 104, 104, 103, 106, 106, 106], 
                         "start dates": [pd.Timestamp(2002, 1, 1), pd.Timestamp(2002, 3, 3), pd.Timestamp(2002,10,20), pd.Timestamp(2003, 4, 4), pd.Timestamp(2003, 8, 9), pd.Timestamp(2005, 2, 8), pd.Timestamp(1993, 1, 1), pd.Timestamp(2005, 2, 3), pd.Timestamp(2005, 2, 16), pd.Timestamp(2002, 11, 16), pd.Timestamp(2005, 2, 23), pd.Timestamp(2005, 10, 11), pd.Timestamp(2015, 2, 9), pd.Timestamp(2011, 11, 24), pd.Timestamp(2011, 11, 24), pd.Timestamp(2011, 12, 21)],
                         "end dates": [pd.Timestamp(2002, 1, 3), pd.Timestamp(2002, 12, 3),pd.Timestamp(2002,11,20), pd.Timestamp(2003, 4, 4), pd.Timestamp(2004, 11, 1), pd.Timestamp(2015, 2, 8), pd.Timestamp(2005, 2, 3), pd.Timestamp(2005, 2, 15) , pd.Timestamp(2005, 2, 21),  pd.Timestamp(2003, 2, 16), pd.Timestamp(2005, 10, 8), pd.Timestamp(2005, 10, 21), pd.Timestamp(2015, 2, 17), pd.Timestamp(2011, 12, 31), pd.Timestamp(2011, 11, 25), pd.Timestamp(2011, 12, 22)]
                        })

This was helpful with answering the overlap question but I am not sure how to modify it (red circle) to include 1 day difference

This was my attempt at answering the question, which kind of did (red circle), but then the overlap calculation is not always right (yellow circle)

def Dates_Restructure(df, pers_id, start_dates, end_dates):
    df.sort_values([pers_id, start_dates], inplace=True)
    df['overlap'] = (df.groupby(pers_id)
                     .apply(lambda x: (x[end_dates].shift() - x[start_dates]) < timedelta(days=-1))
                     .reset_index(level=0, drop=True))
    df['cumsum'] = df.groupby(pers_id)['overlap'].cumsum()
    return df.groupby([pers_id, 'cumsum']).aggregate({start_dates: min, end_dates: max}).reset_index()

I will appreciate your help with this. Thanks

Iruoma · Answer 1 · 2022-04-18T14:45:44.677

This was the answer I came up with and it worked. I combined the 2 solutions in my question to get this solution.

def Dates_Restructure(df_dates, pers_id, start_dates, end_dates):
    df2 = df_dates.copy()
    startdf2 = pd.DataFrame({pers_id: df2[pers_id], 'time': df2[start_dates], 'start_end': 1})
    enddf2 = pd.DataFrame({pers_id: df2[pers_id], 'time': df2[end_dates], 'start_end': -1})
    mergedf2 = pd.concat([startdf2, enddf2]).sort_values([pers_id, 'time'])
    mergedf2['cumsum'] = mergedf2.groupby(pers_id)['start_end'].cumsum()
    mergedf2['new_start'] = mergedf2['cumsum'].eq(1) & mergedf2['start_end'].eq(1)
    mergedf2['group'] = mergedf2.groupby(pers_id)['new_start'].cumsum()
    df2['group_id'] = mergedf2['group'].loc[mergedf2['start_end'].eq(1)]
    df3 = df2.groupby([pers_id, 'group_id']).aggregate({start_dates: min, end_dates: max}).reset_index()  
    df3.sort_values([pers_id, start_dates], inplace=True)
    df3['overlap'] = (df3.groupby(pers_id).apply(lambda x: (x[end_dates].shift() - x[start_dates]) < timedelta(days=-1))
                      .reset_index(level=0, drop=True))
    df3['GROUP_ID'] = df3.groupby(pers_id)['overlap'].cumsum()
    return df3.groupby([pers_id, 'GROUP_ID']).aggregate({start_dates: min, end_dates: max}).reset_index()

Modifying overlapping time period to include 1 day difference

1 Answers1