0

I need to filter rows in a dataframe, based on the value of the "date" column. This is a sample of such a column:

2022-09-08 09:00:00-05
2022-09-08 12:00:00-05
2022-09-08 15:00:00-05
2022-09-08 18:00:00-05
2022-09-09 09:00:00-05
2022-09-09 12:00:00-05
2022-09-09 15:00:00-05
2022-09-09 18:00:00-05
2022-09-10 09:00:00-05
2022-09-10 12:00:00-05
2022-09-10 15:00:00-05

Dtype is set to:

datetime64[ns, UTC-05:00]

The "end_date" is determined by the max value of the "date" column, and the "start_date" is the "end_date" minus 14 days. I don't really need the time part, so I've removed it using .date()

data_df = pd.DataFrame(data=data, columns=columns)
end_date = pd.to_datetime(data_df['date'].max().date())
start_date = pd.to_datetime(end_date - timedelta(days=14))
mask = (data_df['date'] > start_date) & (data_df['date'] <= end_date)
filtered_df = data_df.loc[mask]

When I run the code, I get the following error:

TypeError: Cannot compare tz-naive and tz-aware datetime-like objects TypeError: Invalid comparison between dtype=datetime64[ns, UTC-05:00] and Timestamp

How can I fix this?

BeRT2me
  • 12,699
  • 2
  • 13
  • 31
SamSepiol
  • 19
  • 4
  • 1
    Does this answer your question? [Select DataFrame rows between two dates](https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates) – moo Sep 12 '22 at 22:29

1 Answers1

1
# Convert the whole column to datetime [If necessary]
# df.date = pd.to_datetime(df.date)

# Get max date
max_date = df.date.max().date()

# Here I use '1d' (1 day) for an interesting result.
# You'll use '14d' instead.
mask = df.date.dt.date.between(max_date - pd.to_timedelta('1d'), max_date)
f_df = df.loc[mask]
print(f_df)

# Output:
                        date
4  2022-09-09 09:00:00-05:00
5  2022-09-09 12:00:00-05:00
6  2022-09-09 15:00:00-05:00
7  2022-09-09 18:00:00-05:00
8  2022-09-10 09:00:00-05:00
9  2022-09-10 12:00:00-05:00
10 2022-09-10 15:00:00-05:00

TLDR;

Both times you do data_df['date'] should be data_df['date'].dt.date instead if you want to use your current format.

BeRT2me
  • 12,699
  • 2
  • 13
  • 31
  • After testing your code (the "between" function is GREAT) I just have one question: If I add the "inclusive" parameter, are the following codes equivalent? mask = data_df.date.dt.date.between(max_date - pd.to_timedelta('1d'), max_date, inclusive='right') mask = (data_df['date'].dt.date > start_date) & (data_df['date'].dt.date <= end_date) I think they are, just want to make sure. – SamSepiol Sep 13 '22 at 07:53
  • 1
    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.between.html – BeRT2me Sep 13 '22 at 07:55