6

I read a file into a pandas dataframe with dates that vary in their format:

  • either the American: YYYY-MM-DD

  • or the European: DD.MM.YYYY

They come as a string. I would like to format them all as a date object so pandas.Series.dt can work with them and ideally have them in the second format (DD.MM.YYYY).

pandas.Series.dt gets confuesed with the two different spellings in one column.

smci
  • 32,567
  • 20
  • 113
  • 146
user9092346
  • 292
  • 2
  • 11

2 Answers2

7

Use to_datetime with both formats separately, so get missing values if format not match, so for new column use Series.fillna:

df = pd.DataFrame({'date': ['2000-01-12', '2015-01-23', '20.12.2015', '31.12.2009']}) 
print (df)
         date
0  2000-01-12
1  2015-01-23
2  20.12.2015
3  31.12.2009

date1 = pd.to_datetime(df['date'], errors='coerce', format='%Y-%m-%d')
date2 = pd.to_datetime(df['date'], errors='coerce', format='%d.%m.%Y')
df['date'] = date1.fillna(date2)
print (df)
        date
0 2000-01-12
1 2015-01-23
2 2015-12-20
3 2009-12-31

and ideally have them in the second format

Format of datetimes in python/pandas is by default YYYY-MM-DD, if need custom one it is possible, but values are converted to strings, so datetimelike functions failed:

df['date'] = df['date'].dt.strftime('%d.%m.%Y')
print (df)
         date
0  12.01.2000
1  23.01.2015
2  20.12.2015
3  31.12.2009

print (type(df.loc[0, 'date']))
<class 'str'>
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Just check which of the two format you have and apply pandas.to_datetime with that format.

df = pd.DataFrame({'date': ['2000-01-12', '2015-01-23', '20.12.2015', 
'31.12.2009']}) 
print(df)
         date
0  2000-01-12
1  2015-01-23
2  20.12.2015
3  31.12.2009

def date_formator(date):

    if '-' in date:
        return pd.to_datetime(date, format = '%Y-%m-%d')
    else:
        return pd.to_datetime(date, format = '%d.%m.%Y')

df.date.apply(date_formator)
0   2000-01-12
1   2015-01-23
2   2015-12-20
3   2009-12-31
Name: date, dtype: datetime64[ns]