FIlter Data Frame by time

Question

I have a large data frame that is being imported through an excel sheet. I already filtered it to exclude weekends but also need to do the same so only daytime hours eg 7:00 - 18:00 will be displayed. Here is what the data frame looks like after successfully taking out weekends.

picture of data

isBusinessDay = BDay().is_on_offset
 
match_series = pd.to_datetime(df['timestamp(America/New_York)']).map(isBusinessDay)
df_new = df[match_series]

df_new

score 0 · Answer 1 · answered Dec 22 '21 at 00:13

A simple approach is to use filters on your datetime field using the Series dt accessor.

In this case...

filt = (df['timestamp(America/New_York)'].dt.hour >= 7) & (df['timestamp(America/New_York)'].dt.hour <= 18)

df_filtered = df.loc[filt, :]

More reading: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.html

For more and a sample of this in action, see the below code block. The random date generator was taken from here and modified slightly.

import random
import time
import pandas as pd

def str_time_prop(start, end, time_format, prop):
    """Get a time at a proportion of a range of two formatted times.

    start and end should be strings specifying times formatted in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """

    stime = time.mktime(time.strptime(start, time_format))
    etime = time.mktime(time.strptime(end, time_format))

    ptime = stime + prop * (etime - stime)

    return time.strftime(time_format, time.localtime(ptime))


def random_date(start, end, prop):
    return str_time_prop(start, end, '%Y-%m-%d %I:%M %p', prop)

dates = {'dtfield':[random_date("2007-1-1 1:30 PM", "2009-1-1 4:50 AM", random.random()) for n in range(1000)]}

df = pd.DataFrame(data=dates)

df['dtfield'] = pd.to_datetime(df['dtfield'])

filt = (df['dtfield'].dt.hour >= 7) & (df['dtfield'].dt.hour <= 18)

df_filtered = df.loc[filt, :]

df_filtered

FIlter Data Frame by time

1 Answers1