1

I have a row of messy data where date formats are different and I want them to be coherent as datetime in pandas

df:
          Date
0    1/05/2015
1  15 Jul 2009
2     1-Feb-15
3   12/08/2019

When I run this part:

df['date'] = pd.to_datetime(df['date'], format='%d %b %Y', errors='coerce')

I get

        Date
0        NaT
1 2009-07-15
2        NaT
3        NaT

How do I convert it all to date time in pandas?

1 Answers1

3

pd.to_datetime is capabale of handling multiple date formats in the same column. Specifying a format will hinder its ability to dynamically determine the format, so if there are multiple types do not specify the format:

import pandas as pd

df = pd.DataFrame({
    'Date': ['1/05/2015', '15 Jul 2009', '1-Feb-15', '12/08/2019']
})

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
print(df)
        Date
0 2015-01-05
1 2009-07-15
2 2015-02-01
3 2019-12-08

*There are limitations to the ability to handle multiple date times. Mixed timezone aware and timezone unaware datetimes will not process correctly. Likewise mixed dayfirst and monthfirst notations will not always parse correctly.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • That worked! Thanks. Can you help me understand what do you mean by 'dayfirst and monthfirst notations' ? –  Jul 13 '21 at 02:48
  • 1
    The string '01/03/2020' in dayfirst notation would be 'March 01 2020' and monthfirst notation would make this date 'January 03 2020'. If you have mixed notation like that it's not going to be able to always correctly determine the difference/correct date. – Henry Ecker Jul 13 '21 at 02:49
  • Thats interesting because usually the data coming in is pretty messy however, when the date is greater than 12 then it has to be a date and such. How do I accomodate for this logic? I understand that I am asking a bit much however can I code this part to be a bit more resilient? –  Jul 13 '21 at 02:51
  • It will most certainly handle something like '31/12/2015' with no problem (as this _must_ be dayfirst notation). This is not ambiguous. But it is not _possible_ to differentiate something like '01/03/2020'. – Henry Ecker Jul 13 '21 at 02:53
  • 1
    Ah! this makes my life a bit harder lol. However it is out of the purview of this question. I will ask another one when I encounter this issue. Thanks a ton! –  Jul 13 '21 at 02:54