Convert Pandas Column to DateTime

Question

I have one field in a pandas DataFrame that was imported as string format.

It should be a datetime variable. How do I convert it to a datetime column, and then filter based on date?

Example:

raw_data = pd.DataFrame({'Mycol': ['05SEP2014:00:00:00.000']})

score 794 · Accepted Answer · edited May 24 '23 at 01:35

794

Use the to_datetime function, specifying a format to match your data.

df['Mycol'] = pd.to_datetime(df['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

edited May 24 '23 at 01:35

wjandrea

28,235
9
60
81

answered Nov 05 '14 at 17:50

chrisb

49,833
8
70
70

168

Note: the `format` argument isn't required. `to_datetime` is smart. Go ahead and try it without trying to match your data. – samthebrand Apr 22 '17 at 18:54
3

`format` is not required but passing it makes the conversion run much, much faster. See [this answer](https://stackoverflow.com/a/75277434/19123103) for more info. – cottontail Jan 29 '23 at 18:45
More correctly, in the case of the OP, `format` is required, otherwise `DateParseError` occurs. `pandas` can infer some string formats, but, as point out, using `format` greatly improves performance. – Trenton McKinney May 26 '23 at 15:04

score 104 · Answer 2 · answered Mar 17 '19 at 13:52

104

If you have more than one column to be converted you can do the following:

df[["col1", "col2", "col3"]] = df[["col1", "col2", "col3"]].apply(pd.to_datetime)

answered Mar 17 '19 at 13:52

Vlad Bezden

83,883
25
248
179

If you have different datetime formats in these columns, you can try using `format` parameter like: `apply(pd.to_datetime, format='mixed')` – Rafs Jul 12 '23 at 16:47

mechanical_meat · Answer 3 · 2023-05-26T05:30:47.603

70

edit: recommending to use pd.to_datetime() instead of this because .apply() is generally slower.

You can use the DataFrame method .apply() to operate on the values in Mycol:

>>> df = pd.DataFrame(['05SEP2014:00:00:00.000'], columns=['Mycol'])
>>> df
                    Mycol
0  05SEP2014:00:00:00.000
>>> import datetime as dt
>>> df['Mycol'] = df['Mycol'].apply(lambda x: 
...     dt.datetime.strptime(x, '%d%b%Y:%H:%M:%S.%f'))
>>> df
       Mycol
0 2014-09-05

edited May 26 '23 at 05:30

answered Nov 05 '14 at 17:51

mechanical_meat

163,903
24
228
223

Why use this over `pd.to_datetime`? – wjandrea May 24 '23 at 01:36
1

i probably hadn't yet seen `pd.to_datetime` when i wrote this. added a recommendation to use `pd.to_datetime`. thanks for the comment. – mechanical_meat May 26 '23 at 05:30

score 44 · Answer 4 · edited May 24 '23 at 01:35

44

Use the pandas to_datetime function to parse the column as DateTime. Also, by using infer_datetime_format=True, it will automatically detect the format and convert the mentioned column to DateTime.

import pandas as pd
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], infer_datetime_format=True)

edited May 24 '23 at 01:35

wjandrea

28,235
9
60
81

answered Sep 23 '19 at 10:30

Prateek Sharma

1,371
13
11

score 16 · Answer 5 · edited May 24 '23 at 01:39

16

Time Saver:

raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'])

edited May 24 '23 at 01:39

wjandrea

28,235
9
60
81

answered Oct 29 '21 at 16:44

Gil Baggio

13,019
3
48
37

1

This doesn't work for this specific use case. It gives a `ParserError: Unknown string format: 05SEP2014:00:00:00.000`. – Gonçalo Peres Oct 05 '22 at 10:30

score 6 · Answer 6 · edited May 24 '23 at 01:29

To silence `SettingWithCopyWarning`

If you got this warning, then that means your dataframe was probably created by filtering another dataframe. Make a copy of your dataframe before any assignment and you're good to go.

df = df.copy()
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f')

`errors='coerce'` is useful

If some rows are not in the correct format or not datetime at all, errors= parameter is very useful, so that you can convert the valid rows and handle the rows that contained invalid values later.

df['date'] = pd.to_datetime(
    df['date'], format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

# for multiple columns
df[['start', 'end']] = df[['start', 'end']].apply(
    pd.to_datetime, format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

Setting the correct `format=` is much faster than letting pandas find out¹

Long story short, passing the correct format= from the beginning as in chrisb's post is much faster than letting pandas figure out the format, especially if the format contains time component. The runtime difference for dataframes greater than 10k rows is huge (~25 times faster, so we're talking like a couple minutes vs a few seconds). All valid format options can be found at https://strftime.org/.

¹ Code used to produce the timeit test plot.

import perfplot
from random import choices
from datetime import datetime

mdYHMSf = range(1,13), range(1,29), range(2000,2024), range(24), *[range(60)]*2, range(1000)
perfplot.show(
    kernels=[lambda x: pd.to_datetime(x), 
             lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M:%S.%f'), 
             lambda x: pd.to_datetime(x, infer_datetime_format=True),
             lambda s: s.apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))],
    labels=["pd.to_datetime(df['date'])", 
            "pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M:%S.%f')", 
            "pd.to_datetime(df['date'], infer_datetime_format=True)", 
            "df['date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))"],
    n_range=[2**k for k in range(20)],
    setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}:{S}.{f}" 
                               for m,d,Y,H,M,S,f in zip(*[choices(e, k=n) for e in mdYHMSf])]),
    equality_check=pd.Series.equals,
    xlabel='len(df)'
)

If the column contains multiple formats, see Convert a column of mixed format strings to a datetime Dtype.

score -2 · Answer 7 · edited May 24 '23 at 01:27

-2

Just like we convert object data type to float or int, use astype().

raw_data['Mycol'] = raw_data['Mycol'].astype('datetime64[ns]')

edited May 24 '23 at 01:27

wjandrea

28,235
9
60
81

answered Aug 23 '22 at 08:12

Amar nayak

147
1
7

1

This doesn't work for this specific use case. It gives a `ParserError: Unknown string format: 05SEP2014:00:00:00.000`. – Gonçalo Peres Oct 05 '22 at 10:28

Convert Pandas Column to DateTime

7 Answers7

To silence `SettingWithCopyWarning`

`errors='coerce'` is useful

Setting the correct `format=` is much faster than letting pandas find out¹

Linked

Related

Convert Pandas Column to DateTime

7 Answers7

To silence SettingWithCopyWarning

errors='coerce' is useful

Setting the correct format= is much faster than letting pandas find out1

Linked

Related

To silence `SettingWithCopyWarning`

`errors='coerce'` is useful

Setting the correct `format=` is much faster than letting pandas find out¹