1

This seems like a straightforward question, but I've been stuck for a while on it now. Apologies if this has been asked already. I have the following pandas dataframe:

import pandas as pd
zed = pd.DataFrame({'gameDate': {0: datetime.date(2019, 12, 12),
  1: datetime.date(2019, 12, 12),
  2: datetime.date(2019, 12, 12),
  3: datetime.date(2019, 12, 12),
  4: datetime.date(2019, 12, 12)},
 'periodType': {0: 'REGULAR',
  1: 'REGULAR',
  2: 'REGULAR',
  3: 'REGULAR',
  4: 'REGULAR'}})

and when I check to see the dtypes of this dataframe, I get the following output:

print(zed.dtypes)

gameDate      object
periodType    object
dtype: object

How can the date types be distinguished from the string types when both are returned as object? I have a much larger dataframe for which I need to identify all columns of type date (not datetime or timestamp, but specifically date only, like above), but I don't specifically know the column names. I'd like to use something like the .dtypes function to help reveal these columns.

I could create a function that tests if the string is of the format yyyy-mm-dd, but that seems tedious.

Thanks!

Canovice
  • 9,012
  • 22
  • 93
  • 211
  • 1
    Is there a specific reason you need to keep the `datetime.date`? `pandas` is compatible with the `np.datetime64[ns]` and `np.timedelta64[ns]` dtypes. Because the datetime module is not the supported library there are several issues that you can encounter: https://stackoverflow.com/questions/60888277/select-data-based-on-datetime-in-pandas-dataframe/60888432#60888432 and https://stackoverflow.com/questions/60256108/datetime-timestamp-returns-different-values-in-pandas-apply-and-dataframe-select/60256274#60256274 are two examples where pandas diverges from the standard datetime module. – ALollz Jul 28 '20 at 21:53
  • The real data is fetched from a SQL database into python. It is a `date` type in the database, and it seems looks like python / pandas is simply auto-casting the column into `datetime.date`. – Canovice Jul 28 '20 at 21:55
  • I'd append to my question then "how can i cast into `np.datetime64[ns]`", but this can be looked up easily – Canovice Jul 28 '20 at 21:57
  • 1
    Missing values might complicate this, but you could check the first value of each column: `zed.iloc[0].apply(type)`? – ALollz Jul 28 '20 at 22:01

1 Answers1

1

You have two options:

1- Use pd.datetime in the construction of your 'gameDate' dictionary as follow:

zed = pd.DataFrame(
    {'gameDate': {
        0: pd.datetime(2019, 12, 12),
        1: pd.datetime (2019, 12, 12),
        2: pd.datetime(2019, 12, 12),
        3: pd.datetime(2019, 12, 12),
        4: pd.datetime(2019, 12, 12)
    },
     'periodType': {
         0: 'REGULAR',
         1: 'REGULAR',
         2: 'REGULAR',
         3: 'REGULAR',
         4: 'REGULAR'
     }
    }
)
  • but pd.datetime is deprecated and will be removed from pandas in the future.

2- Convert the column after declaring the dataframe (probably the best practice). Just use:

zed = pd.DataFrame(
    {'gameDate': {
        0: datetime.date(2019, 12, 12),
        1: datetime.date(2019, 12, 12),
        2: datetime.date(2019, 12, 12),
        3: datetime.date(2019, 12, 12),
        4: datetime.date(2019, 12, 12)
    },
    'periodType': {
        0: 'REGULAR',
        1: 'REGULAR',
        2: 'REGULAR',
        3: 'REGULAR',
        4: 'REGULAR'
    }
   }
)

zed['gameDate'] = pd.to_datetime(zed['gameDate'])