How can I convert a DataFrame column of strings (in dd/mm/yyyy format) to datetime dtype?
6 Answers
The easiest way is to use to_datetime
:
df['col'] = pd.to_datetime(df['col'])
It also offers a dayfirst
argument for European times (but beware this isn't strict).
Here it is in action:
In [11]: pd.to_datetime(pd.Series(['05/23/2005']))
Out[11]:
0 2005-05-23 00:00:00
dtype: datetime64[ns]
You can pass a specific format:
In [12]: pd.to_datetime(pd.Series(['05/23/2005']), format="%m/%d/%Y")
Out[12]:
0 2005-05-23
dtype: datetime64[ns]

- 56,955
- 33
- 144
- 158

- 359,921
- 101
- 625
- 535
If your date column is a string of the format '2017-01-01' you can use pandas astype to convert it to datetime.
df['date'] = df['date'].astype('datetime64[ns]')
or use datetime64[D] if you want Day precision and not nanoseconds
print(type(df_launath['date'].iloc[0]))
yields
<class 'pandas._libs.tslib.Timestamp'>
the same as when you use pandas.to_datetime
You can try it with other formats then '%Y-%m-%d' but at least this works.

- 1,335
- 13
- 13
-
fyi when timezone is specified in the string it ignores it – fantabolous Sep 02 '22 at 09:03
You can use the following if you want to specify tricky formats:
df['date_col'] = pd.to_datetime(df['date_col'], format='%d/%m/%Y')
More details on format
here:

- 3,591
- 2
- 25
- 26

- 943
- 10
- 9
If you have a mixture of formats in your date, don't forget to set infer_datetime_format=True
to make life easier.
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
Source: pd.to_datetime
or if you want a customized approach:
def autoconvert_datetime(value):
formats = ['%m/%d/%Y', '%m-%d-%y'] # formats to try
result_format = '%d-%m-%Y' # output format
for dt_format in formats:
try:
dt_obj = datetime.strptime(value, dt_format)
return dt_obj.strftime(result_format)
except Exception as e: # throws exception when format doesn't match
pass
return value # let it be if it doesn't match
df['date'] = df['date'].apply(autoconvert_datetime)
-
A customized approach can be used without resorting to `.apply` which has no fast cache, and will struggle when converting a billion values. An alternative, but not a great one, is `col = pd.concat([pd.to_datetime(col, errors='coerce', format=f) for f in formats], axis='columns').bfill(axis='columns').iloc[:, 0]` – Asclepius Dec 03 '20 at 22:13
-
3If you have a mixture of formats, you **should not use `infer_datetime_format=True`** as this assumes a single format. Just skip this argument. To understand why, try `pd.to_datetime(pd.Series(['1/5/2015 8:08:00 AM', '1/4/2015 11:24:00 PM']), infer_datetime_format=True)` with and without `errors='coerce'`. See [this issue](https://github.com/pandas-dev/pandas/issues/25143). – Asclepius Dec 09 '20 at 02:01
Multiple datetime columns
If you want to convert multiple string columns to datetime, then using apply()
would be useful.
df[['date1', 'date2']] = df[['date1', 'date2']].apply(pd.to_datetime)
You can pass parameters to to_datetime
as kwargs.
df[['start_date', 'end_date']] = df[['start_date', 'end_date']].apply(pd.to_datetime, format="%m/%d/%Y")
Passing to apply
, without specifying axis
, still converts values vectorially for each column. apply
is needed here because pd.to_datetime
can only be called on a single column. If it has to be called on multiple columns, the options are either use an explicit for-loop
, or pass it to apply
. On the other hand, if you call pd.to_datetime
using apply
on a column (e.g. df['date'].apply(pd.to_datetime))
, that would not be vectorized, and should be avoided.
Use format=
to speed up
If the column contains a time component and you know the format of the datetime/time, then passing the format explicitly would significantly speed up the conversion. There's barely any difference if the column is only date, though. In my project, for a column with 5 millions rows, the difference was huge: ~2.5 min vs 6s.
It turns out explicitly specifying the format is about 25x faster. The following runtime plot shows that there's a huge gap in performance depending on whether you passed format or not.
The code used to produce the plot:
import perfplot
import random
mdYHM = range(1, 13), range(1, 29), range(2000, 2024), range(24), range(60)
perfplot.show(
kernels=[lambda x: pd.to_datetime(x), lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M')],
labels=['pd.to_datetime(x)', "pd.to_datetime(x, format='%m/%d/%Y %H:%M')"],
n_range=[2**k for k in range(19)],
setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}"
for m,d,Y,H,M in zip(*[random.choices(e, k=n) for e in mdYHM])]),
equality_check=pd.Series.equals,
xlabel='len(df)'
)

- 56,955
- 33
- 144
- 158

- 10,268
- 18
- 50
- 51
Try this solution:
- Change
'2022–12–31 00:00:00' to '2022–12–31 00:00:01'
- Then run this code:
pandas.to_datetime(pandas.Series(['2022–12–31 00:00:01']))
- Output:
2022–12–31 00:00:01

- 11
- 1