Timestamp filtering with pandas

Question

I am working with those two dataframes. df1 is a list of potential shift that the person can do. And df2 is a basically his/her schedule when he/she is available.

Starts_at is the begining of the shift/availability ; Ends_at is the end of the shift/availability

I would like to to remove all the potential shift that the person cannot do because of the schedule. Like for example you can see that this person is not available on the morning so I should remove 0 and 3 index in n1

I started doing something like that df1[df1.starts_at[0] >= df2.starts_at] but it is not working

n1 = {'starts_at': {0: Timestamp('2019-06-02 09:00:00'),
  1: Timestamp('2019-06-02 17:00:00'),
  2: Timestamp('2019-06-02 14:00:00'),
  3: Timestamp('2019-06-03 09:30:00')},
 'ends_at': {0: Timestamp('2019-06-02 17:00:00'),
  1: Timestamp('2019-06-02 22:30:00'),
  2: Timestamp('2019-06-02 22:30:00'),
  3: Timestamp('2019-06-03 13:00:00')}}

n2 = {'starts_at': {0: Timestamp('2019-06-01 14:00:00'),
  1: Timestamp('2019-06-01 14:41:32.464000'),
  2: Timestamp('2019-06-01 15:00:00'),
  3: Timestamp('2019-06-02 10:00:00'),
  4: Timestamp('2019-06-02 14:00:00'),
  5: Timestamp('2019-06-02 17:00:00'),
  6: Timestamp('2019-06-02 17:30:00'),
  7: Timestamp('2019-06-03 17:00:00')},
 'ends_at': {0: Timestamp('2019-06-01 22:30:00'),
  1: Timestamp('2019-06-01 22:32:44.862000'),
  2: Timestamp('2019-06-01 22:30:00'),
  3: Timestamp('2019-06-02 18:30:00'),
  4: Timestamp('2019-06-03 00:00:00'),
  5: Timestamp('2019-06-03 00:00:00'),
  6: Timestamp('2019-06-02 21:00:00'),
  7: Timestamp('2019-06-03 23:00:00')}}

df1 = pd.DataFrame(n1)
df2 = pd.DataFrame(n2)

thanks for your help

PD https://stackoverflow.com/questions/42462218/find-date-range-overlap-in-python — splash58, Oct 30 '19 at 13:23

score 2 · Answer 1 · answered Oct 30 '19 at 13:40

You can use np.greater_equal and np.less_equal with as ufunc with outer to be able to compare all 'starts_at' between both dfs at once and the same for 'ends_at'. Then use any over the column to be able to catch the rows in df1 that have any possibility:

mask = ( np.less_equal.outer(df1.starts_at, df2.starts_at)
         & np.greater_equal.outer(df1.ends_at, df2.ends_at)).any(1)
print (df1[mask])
            starts_at             ends_at
1 2019-06-02 17:00:00 2019-06-02 22:30:00
2 2019-06-02 14:00:00 2019-06-02 22:30:00

For a bit more details, np.less_equal.outer(df1.starts_at, df2.starts_at) gives an array where rows are the value of df1.starts_at and they are compared to the values of starts_at in df2 and what we look for is if the starts in df1 are less (or earlier) than the one in df2:

array([[False, False, False,  True,  True,  True,  True,  True],
       [False, False, False, False, False,  True,  True,  True],
       [False, False, False, False,  True,  True,  True,  True],
       [False, False, False, False, False, False, False,  True]])

baccandr · Answer 2 · 2019-10-30T14:04:04.473

0

You can also use numpy broadcasting power:

i,j=np.where((df1.starts_at.values[:,None]>=df2.starts_at.values)&\
            (df1.ends_at.values[:,None]<=df2.ends_at.values))

#select values and remove duplicates
dfsel=df1.iloc[i][~df1.iloc[i].index.duplicated(keep='first')]

edited Oct 30 '19 at 14:04

answered Oct 30 '19 at 13:58

baccandr

1,090
8
15

Timestamp filtering with pandas

2 Answers2