-5

I have a data frame, df, where one of the columns stores processing times (TimeStamp objects).

A sample data frame:

from datetime import datetime, date
import pandas as pd

ids = ['WO_EW-1_10AUR-15-0031_00', 'IW-12_0400-15-0012_00', 'E-8_10AUR-18-0037_00']
dates = [date(2015,9,14), date(2015,9,17), date(2018,8,16)]
datetimes = [datetime(2015,9,14,13,23,40), datetime(2015,9,17,9,6,7), datetime(2018,8,16,7,32,6)]
datalist = list(zip(ids, dates, datetimes))

df = pd.DataFrame(datalist, columns=['ID', 'ProcessDate', 'ProcessingTime'])

enter image description here

What I want to achieve is to extract all records that satisfy a certain condition (or multiple conditions). In one case, I want to locate all records where 'ProcessingTime' attribute has an hour value that is greater than 13:10. In the sample data frame above, the desired output in this case would be the first record.

What is the proper way to apply this type of condition to data frame records?


P.S. I tried using the following, but both did not work:

df.loc[ (df['ProcessTime'].time().hour > 14) ]

This throws a "AttributeError" as 'Series' object has no attribute 'time'

and

df.loc[ (df['ProcessTime'] > datetime.time(14, 0, 0) ]

This throws a "TypeError" as Invalid comparison between dtype=datetime64[ns] and time

O. Mohsen
  • 179
  • 1
  • 6
  • Does this answer your question? [Select DataFrame rows between two dates](https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates) – Trenton McKinney Aug 11 '20 at 15:20

1 Answers1

2
import pandas as pd
from datetime import date, datetime, time

ids = ['WO_EW-1_10AUR-15-0031_00', 'IW-12_0400-15-0012_00', 'E-8_10AUR-18-0037_00']
dates = [date(2015,9,14), date(2015,9,17), date(2018,8,16)]
datetimes = [datetime(2015,9,14,13,23,40), datetime(2015,9,17,9,6,7), datetime(2018,8,16,7,32,6)]
datalist = list(zip(ids, dates, datetimes))

df = pd.DataFrame(datalist, columns=['ID', 'ProcessDate', 'ProcessingTime'])

# display(df)
                         ID ProcessDate      ProcessingTime
0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40
1     IW-12_0400-15-0012_00  2015-09-17 2015-09-17 09:06:07
2      E-8_10AUR-18-0037_00  2018-08-16 2018-08-16 07:32:06

# single condition
df[df.ProcessingTime.dt.hour > 7]

[out]:
                         ID ProcessDate      ProcessingTime
0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40
1     IW-12_0400-15-0012_00  2015-09-17 2015-09-17 09:06:07

# multiple conditions
df[(df.ProcessingTime.dt.hour > 7) & (df.ProcessingTime.dt.minute > 10)]

[out]:
                         ID ProcessDate      ProcessingTime
0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40

# an entire datetime
df[df.ProcessingTime < '2015-09-17 09:06:07']

[out]:
                         ID ProcessDate      ProcessingTime
0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40

# using .time
df[df.ProcessingTime.dt.time > time.fromisoformat('07:32:06')]

[out]:
                         ID ProcessDate      ProcessingTime
0  WO_EW-1_10AUR-15-0031_00  2015-09-14 2015-09-14 13:23:40
1     IW-12_0400-15-0012_00  2015-09-17 2015-09-17 09:06:07
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158