0

I have a pandas DataFrame with date column named SDate, I would like to find the rows for which SDate's hour is between 2 and 4 pm, something like :

d=pd.read_csv("data.csv",parse_dates=['SDate'])
indices=np.nonzero(d.SDate.hour>=2&d.SDate.hour<=4)[0]

but this doesnt work like that. how to do it in the most pandas and pythonic way?

Ofek Ron
  • 8,354
  • 13
  • 55
  • 103

3 Answers3

2

If you want a mask, this should do it:

import pandas as pd

df = pd.DataFrame(data=pd.date_range(start='1/1/2018', end='1/2/2018', freq='H'), columns=['SDate'])
mask = (df.SDate.dt.hour >= 2) & (df.SDate.dt.hour <= 4)
print(mask.values)

Output

[False False  True  True  True False False False False False False False
 False False False False False False False False False False False False
 False]

In the above example df is DataFrame with a datetime dtype column SDate, dt is an accessor object for datetimelike properties, like hour.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
1

You can use between_time too

d = d.set_index('SDate')
d.between_time(start_time = '14:00', end_time = '16:00')
Mohit Motwani
  • 4,662
  • 3
  • 17
  • 45
0

You need to separate your conditions via parentheses:

s = pd.to_datetime(['2018-01-01 02:00', '2015-12-25 15:00'])

indices = np.nonzero(s.hour>=2 & s.hour<=4)[0]
# TypeError: unsupported operand type(s) for &: 'int' and 'Int64Index'

indices = np.nonzero((s.hour>=2) & (s.hour<=4))[0]
# array([0], dtype=int64)

Note you can use pd.Series.argmax for this calculation, as a scalar output is sufficient:

indices = ((s.hour>=2) & (s.hour<=4)).argmax()
# 0
jpp
  • 159,742
  • 34
  • 281
  • 339