21

I have a huge DataFrame where the columns aren't ever in order nor do I know their name.

What do I do to find all the columns which are datetime types?

Most of the solutions online, the poster knows the name of the column so I am having a bit trouble as I do not. What can I do in this situation?

miradulo
  • 28,857
  • 6
  • 80
  • 93
J. Doe
  • 347
  • 1
  • 2
  • 6
  • What _else_ do the columns have? Other types? `NaN`? Can you add some sample data please? – miradulo Feb 19 '17 at 02:22
  • I have over 100 columns, some columns are only ints, some are boolean, some are alphanumerical. The datetime column does not have NaN. (There are only 2 of them, I just want to get anyone of them). Let's assume no NaN's. I didn't find any in sample of over 10 datasets. – J. Doe Feb 19 '17 at 02:31
  • related if using numerical columns: https://stackoverflow.com/questions/25039626/how-do-i-find-numeric-columns-in-pandas#28155580 – Eulenfuchswiesel Feb 19 '20 at 11:30

4 Answers4

32

You can use pandas.DataFrame.select_dtypes(), and include only the datetime64 type.

df.select_dtypes(include=['datetime64'])

Demo

>>> df
         dts1       dts2  ints
0  2012-01-01 2004-01-01     0
1  2012-01-02 2004-01-02     1
2  2012-01-03 2004-01-03     2
..        ...        ...   ...
97 2012-04-07 2004-04-07    97
98 2012-04-08 2004-04-08    98
99 2012-04-09 2004-04-09    99

>>> df.select_dtypes(include=['datetime64'])
         dts1       dts2
0  2012-01-01 2004-01-01
1  2012-01-02 2004-01-02
2  2012-01-03 2004-01-03
..        ...        ...
97 2012-04-07 2004-04-07
98 2012-04-08 2004-04-08
99 2012-04-09 2004-04-09
miradulo
  • 28,857
  • 6
  • 80
  • 93
6

Since each column of a pandas DataFrame is a pandas Series simply iterate through list of column names and conditionally check for series.dtype of datetime (typically datetime64[ns]):

for col in df.columns:
   if df[col].dtype == 'datetime64[ns]':
      print(col)

Or as list comprehension:

[col for col in df.columns if df[col].dtype == 'datetime64[ns]']

Or as a series filter:

df.dtypes[df.dtypes=='datetime64[ns]']
Parfait
  • 104,375
  • 17
  • 94
  • 125
1

For datetime columns that include datetimes with timezones (e.g. datetime64[ns, UTC]), this is a general solution:

def get_datetime_columns_of_data_frame(df):
    # dtypes as data frame
    df_type = df.dtypes.rename_axis('column')\
        .to_frame('dtype')\
        .reset_index(drop=False)
    # dtype as string for easier filtering
    df_type['dtype_str'] = df_type['dtype'].map(str)
    return df_type[df_type['dtype_str'].str.contains('datetime64')]['column'].tolist()
Orestis Tsinalis
  • 351
  • 3
  • 14
1

I'm putting up this answer for two reasons:

  1. It works
  2. I want someone to improve it.. there should be some sort of 'all_datetime_types' thingy in Pandas.. and I'm probably just missing it...
datetime_types = ["datetime", "datetime64", "datetime64[ns]", "datetimetz"]
for c in df.select_dtypes(include=datetime_types).columns:
    print(f"Doing something with column {c}...")
Brian Wylie
  • 2,347
  • 28
  • 29