I have a pandas dataframe where only a few columns are dates.
Example of a dataframe (dates here for the sake of example are str but in my case they are an object):
df = pd.DataFrame({
"activity": ["clean dishes", "fix porch", "slep on couch"],
"finished": ["NaT", "NaT", "2022-12-29"],
"2022-12-27 00:00:00": [1,1,1],
"2022-12-28 00:00:00": [1,1,1],
"2022-12-29 00:00:00": [1,1,0]
})
print(df.columns)
Index(['activity', 'finished', 2022-12-27 00:00:00, 2022-12-28 00:00:00, 2022-12-29 00:00:00], dtype='object')
I want to convert the last three column names to date (don't want the timestamp included) so that I can compare the dates in the finished column with the different column names and place a zero where activity is finished before. I tried using this approach but did not work (including suggestion in the comments).
To achieve my goal I created this:
from datetime import datetime
import pandas as pd
def format_header_dates(dataframe):
"""Converting the dates in the header to date"""
for column in dataframe.columns:
if isinstance(column, pd.Timestamp):
new_column = pd.Timestamp(column).date()
dataframe = dataframe.rename(columns={column: new_column})
return dataframe
df = format_header_dates(df)
However I get this warning:
FutureWarning: Comparison of Timestamp with datetime.date is deprecated in order to match the standard library behavior. In a future version these will be considered non-comparable. Use 'ts == pd.Timestamp(date)' or 'ts.date() == date' instead.
return key in self._engine
This leaves me with two questions:
- Is there a better way to convert a subset of column names to date?
- What exactly is causing this warning (isinstance?) and how can I make the necessary corrections?
Solution:
After spending two days scratching my head and googling, I could not pinpoint the root cause of the FutureWarning
but got my way around it.
Step 1: Convert every date to datetime64[ns]
and normalize it (to set h:m:s:ns
to zero as I have no interest in such precision) with the following: pd.to_datetime(column).normalize().to_datetime64()
Step 2: Do whatever operations I wanted to, which in my case required comparing dates.
Step 3: Cosmetically adjust the dates by keeping only the date component with: pd.to_datetime(column).to_datetime64().astype('datetime64[D]')
This allowed me to do any date operations I wanted and no longer displayed the FutureWarning: Comparison of Timestamp with datetime.date is deprecated...