0

I have a pandas dataframe where only a few columns are dates.

Example of a dataframe (dates here for the sake of example are str but in my case they are an object):

df = pd.DataFrame({
    "activity": ["clean dishes", "fix porch", "slep on couch"],
    "finished": ["NaT", "NaT", "2022-12-29"],
    "2022-12-27 00:00:00": [1,1,1],
    "2022-12-28 00:00:00": [1,1,1],
    "2022-12-29 00:00:00": [1,1,0]
})

    print(df.columns)
    Index(['activity', 'finished', 2022-12-27 00:00:00, 2022-12-28 00:00:00, 2022-12-29 00:00:00], dtype='object')

I want to convert the last three column names to date (don't want the timestamp included) so that I can compare the dates in the finished column with the different column names and place a zero where activity is finished before. I tried using this approach but did not work (including suggestion in the comments).

To achieve my goal I created this:

from datetime import datetime
import pandas as pd

    def format_header_dates(dataframe):
        """Converting the dates in the header to date"""
        for column in dataframe.columns:
            if isinstance(column, pd.Timestamp):
                new_column = pd.Timestamp(column).date()
                dataframe = dataframe.rename(columns={column: new_column})
    
        return dataframe

df = format_header_dates(df)

However I get this warning:

FutureWarning: Comparison of Timestamp with datetime.date is deprecated in order to match the standard library behavior. In a future version these will be considered non-comparable. Use 'ts == pd.Timestamp(date)' or 'ts.date() == date' instead.
  return key in self._engine

This leaves me with two questions:

  1. Is there a better way to convert a subset of column names to date?
  2. What exactly is causing this warning (isinstance?) and how can I make the necessary corrections?

Solution:

After spending two days scratching my head and googling, I could not pinpoint the root cause of the FutureWarning but got my way around it.

Step 1: Convert every date to datetime64[ns] and normalize it (to set h:m:s:ns to zero as I have no interest in such precision) with the following: pd.to_datetime(column).normalize().to_datetime64()

Step 2: Do whatever operations I wanted to, which in my case required comparing dates.

Step 3: Cosmetically adjust the dates by keeping only the date component with: pd.to_datetime(column).to_datetime64().astype('datetime64[D]')

This allowed me to do any date operations I wanted and no longer displayed the FutureWarning: Comparison of Timestamp with datetime.date is deprecated...

Filipe
  • 41
  • 1
  • 5
  • "*column names to date (don't want the timestamp included)*" - why? Since hour/minute/seconds are all zero you might just use the pandas datetime data type. pandas.Timestamp does not handle date and time separately, and in general it's safer to "stay within pandas" and not mix in vanilla Python datetime/date/time. That's likely why you get the warning. – FObersteiner Dec 29 '22 at 14:05
  • I am interested in comparing the day only not the time when a given activity is finished, that's why I was trying to remove the time portion of it. Also, this will all be dumped to Excel and wanted to have only the date without the timestamp in the column header. I'll try your suggestion to get rid of the warning and report if it worked. – Filipe Dec 29 '22 at 14:20
  • I tried using `pd.to_datetime(column).date()` and still get the warning. It seems the warning is only visible once line `dataframe = dataframe.rename(columns={column: new_column})` executes. – Filipe Dec 29 '22 at 15:30
  • Tried removing `.date()` altogether? If you need datetime in a certain format, you can still `strftime`... – FObersteiner Dec 29 '22 at 15:52
  • That also did not work. I found a way around it. See my separate answer. Thank you for the help though! – Filipe Dec 30 '22 at 13:23

0 Answers0