Pandas parse non-english string dates

Question

Pandas is pretty great at parsing string dates when they are in english:

In [1]: pd.to_datetime("11 January 2014 at 10:50AM")
Out[1]: Timestamp('2014-01-11 10:50:00')

I'm wondering if there's an easy way to do the same using pandas when strings are in another language, for example in french:

In [2]: pd.to_datetime("11 Janvier 2016 à 10:50")

ValueError: Unknown string format

Ideally, there would be a way to do it directly in pd.read_csv.

score 13 · Answer 1 · answered Dec 09 '16 at 00:50

There is a module named dateparser that is capable of handling numerous languages including french, russian, spanish, dutch and over 20 more. It also can recognize stuff like time zone abbreviations, etc.

Let's confirm it works for a single date:

In [1]: import dateparser
        dateparser.parse('11 Janvier 2016 à 10:50')
Out[1]: datetime.datetime(2016, 1, 11, 10, 50)

Moving on to parsing this test_dates.csv file:

               Date  Value
0    7 janvier 1983     10
1  21 décembre 1986     21
2    1 janvier 2016     12

You can actually use dateparser.parse as the parser:

In [2]: df = pd.read_csv('test_dates.csv',
                         parse_dates=['Date'], date_parser=dateparser.parse)
        print(df)

Out [2]:
        Date  Value
0 1983-01-07     10
1 1986-12-21     21
2 2016-01-01     12

Obviously if you need to do that after having already loaded the dataframe, you can always use apply, or map:

# Using apply (6.22 ms per loop)
df.Date = df.Date.apply(lambda x: dateparser.parse(x))

# Or map which is slightly slower (7.75 ms per loop)
df.Date = df.Date.map(dateparser.parse)

score 1 · Answer 2 · answered Dec 22 '20 at 15:12

it also works correctly if you set the appropriate locale and set the format for parsing:

import locale
locale.setlocale(locale.LC_ALL, 'fr_FR')

import pandas as pd
pd.to_datetime("11 Janvier 2016 à 10:50", format='%d %B %Y à %H:%M')
# Timestamp('2016-01-11 10:50:00')

Pandas parse non-english string dates

2 Answers2

Linked

Related