1

I have a pandas dataframe like this (with timestamp converted to datetime object):

    id  timestamp
0   221 2020-11-07 12:02:00
1   223 2020-11-08 13:21:00
2   224 2020-11-09 12:50:00
3   225 2020-11-10 14:23:00
4   226 2020-11-11 12:25:00
5   227 2020-11-14 14:26:00

I want to find number of rows between a time interval. e.g 12:00-13:00, here it would be 3 (entries 0, 2 and 4)

mesh
  • 849
  • 9
  • 16
  • 2
    Looks like [between_time](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.between_time.html) will be helpfull – JvdV Mar 26 '20 at 21:07
  • 1
    Here's an example of between_time from SO https://stackoverflow.com/questions/35052691/filter-pandas-dataframe-by-time – lsabi Mar 26 '20 at 21:12
  • Have you tried anything, done any research? – AMC Mar 26 '20 at 21:27
  • Does this answer your question? [Select DataFrame rows between two dates](https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates) – AMC Mar 26 '20 at 21:28
  • between_time would work. Just set timestamp as index. – wwnde Mar 26 '20 at 21:30

4 Answers4

3

So as I commented, I think you can just use between_time function:

CountRows = df.set_index('timestamp').between_time('12:00','13:00').shape[0]

This, in your case, sets column timestamp as index and then returns the count of rows between the two timevalues. Where:

"The first element of the tuple returned by Dataframe.shape contains the number of items in index in a dataframe i.e. basically the number of rows in the dataframe. Source

JvdV
  • 70,606
  • 8
  • 39
  • 70
1

Since I believe that suggested between_time works only for DatetimeIndex, you can either set your DataFrame index to 'timestamp' and then use between_time or, alternatively, first, use sorting

df.sort_values(by='timestamp',axis='columns',inplace=True)

Then use sorted search

start = df['timestamp'].searchsorted(pd.Timestamp('2020-11-07 12:00:00'), side='left')
end = df['timestamp'].searchsorted(pd.Timestamp('2020-11-07 13:00:00'), side='right')

Then find number of rows

count = start - end
Suthiro
  • 1,210
  • 1
  • 7
  • 18
1

Please try

Coerce timestamp to datetime and sort ascending

  df['timestamp']=pd.to_datetime(df['timestamp']).sort_values(ascending=True)

Reset index but do not drop to keep id

df.reset_index(drop=False, inplace=True)

Set timestamp as new index to allow use of df.betweeen time

df.set_index(df['timestamp'], inplace=True)
df.between_time('12:00', '13:00')
wwnde
  • 26,119
  • 6
  • 18
  • 32
0

As suggested by posts in Comments section between_time works well. One need to make timestamp (which is a datetime64 object here) a index first, then use the between_time function.

mesh
  • 849
  • 9
  • 16