2

If I have one DataFrame of start and end times like:

2015-11-21 16:00:00 2015-11-22 00:30:00
2015-05-16 12:15:00 2015-05-16 22:03:00
2015-10-15 16:00:00 2015-10-15 23:30:00

And then a Series of timestamps like this:

2015-11-21 18:42:13
2015-11-21 00:32:00
2015-05-16 12:37:00

And I want to be able to filter that Series to get only the times within the above times, so in this case:

2015-11-21 18:42:13
2015-05-16 12:37:00

series[series.between_time(df.start, df.end)], won't work because it needs scalar times; series[df.start < series < df.end] but that give me a ValueError because the Series aren't identically labelled.

I feel like there must be a way to do this with a Timedelta or something similar, but I can't figure out how.

Mike D
  • 727
  • 2
  • 10
  • 26

2 Answers2

3

You can use first cross join and then filter by query:

print (df)
                start                 end
0 2015-11-21 16:00:00 2015-11-22 00:30:00
1 2015-05-16 12:15:00 2015-05-16 22:03:00
2 2015-10-15 16:00:00 2015-10-15 23:30:00

print (series)
0   2015-11-21 18:42:13
1   2015-11-21 00:32:00
2   2015-05-16 12:37:00 <-changed date
Name: date, dtype: datetime64[ns]

df = pd.merge(df.assign(A=1), series.to_frame().assign(A=1), on='A')
df = df.query('start < date < end')['date']
print (df)
0   2015-11-21 18:42:13
5   2015-05-16 12:37:00
Name: date, dtype: datetime64[ns]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Assuming your dataframe is df and series is series
We can use searchsorted to see if the sorted location of each value in series lands in the same row.

df.sort_values('start', inplace=True)

series.loc[
    df.iloc[:, 0].searchsorted(series) - 1 == df.iloc[:, 1].searchsorted(series)
]

0   2015-11-21 18:42:13
2   2015-05-16 12:37:00
dtype: datetime64[ns]

We can numpyfy this with

v = df.sort_values('start').values
s = series.values

series.loc[v[:, 0].searchsorted(s) - 1 == v[:, 1].searchsorted(s)]

0   2015-11-21 18:42:13
2   2015-05-16 12:37:00
dtype: datetime64[ns]
piRSquared
  • 285,575
  • 57
  • 475
  • 624