Find date range overlap in python and return overlap

Question

I am working on a similar problem as here I have a dataframe with two datetime columns, and I would need to identify overlaps.

import pandas as pd
from datetime import datetime
df = pd.DataFrame(columns=['id','from','to'], index=range(5), \
                  data=[[878,'2006-01-01','2007-10-01'],
                        [878,'2007-10-02','2008-12-01'],
                        [878,'2008-12-02','2010-04-03'],
                        [879,'2010-04-04','2199-05-11'],
                        [879,'2016-05-12','2199-12-31']])

df['from'] = pd.to_datetime(df['from'])
df['to'] = pd.to_datetime(df['to'])

The following works greatly to identify presence of overlaps as binary variable

df['overlap'] = (df.groupby('id')
                   .apply(lambda x: (x['to'].shift() - x['from']) > pd.Timedelta(seconds=0))
                   .reset_index(level=0, drop=True))

which returns (correctly):

[49]: 
    id       from         to  overlap
0  878 2006-01-01 2007-10-01    False
1  878 2007-10-02 2008-12-01    False
2  878 2008-12-02 2010-04-03    False
3  879 2010-04-04 2199-05-11    False
4  879 2016-05-12 2199-12-31     True

I now would like to extend the solution by keeping the start of the overlap and the end of overlap, whenever there is an overlap. I have tried to have the apply return a pd.Series as in

df.groupby('id').apply(lambda x: 
pd.Series([x['to'].shift() - x['from'] > pd.Timedelta(seconds=0),
x['from'], 
x['to'].shift()],
index=['is_overlap','start_overlap','end_overlap']))

But the resulting dataframe as a completely changed shape (not 5 rows anymore). I just wanted

[49]: 
        id       from         to  is_overlap    start_overlap   end_overlap
    0  878 2006-01-01 2007-10-01    False    np.NaT       np.NaT
    1  878 2007-10-02 2008-12-01    False    np.NaT       np.NaT
    2  878 2008-12-02 2010-04-03    False    np.NaT       np.NaT
    3  879 2010-04-04 2199-05-11    False    np.NaT       np.NaT
    4  879 2016-05-12 2199-12-31     True    2016-05-12   2199-05-11

Can you also include how does the resulting dataframe looks like? — Andreas, Nov 12 '18 at 09:29
So, what you want is to only have 1 `overlap` column with either `False` or the date range? — Andreas, Nov 12 '18 at 09:35
I know how to get the columns with False. I would need to get to the one with False and the date range — 00__00__00, Nov 12 '18 at 09:36

Find date range overlap in python and return overlap

0 Answers0