9

Pandas is pretty great at parsing string dates when they are in english:

In [1]: pd.to_datetime("11 January 2014 at 10:50AM")
Out[1]: Timestamp('2014-01-11 10:50:00')

I'm wondering if there's an easy way to do the same using pandas when strings are in another language, for example in french:

In [2]: pd.to_datetime("11 Janvier 2016 à 10:50")

ValueError: Unknown string format

Ideally, there would be a way to do it directly in pd.read_csv.

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
Julien Marrec
  • 11,605
  • 4
  • 46
  • 63

2 Answers2

13

There is a module named dateparser that is capable of handling numerous languages including french, russian, spanish, dutch and over 20 more. It also can recognize stuff like time zone abbreviations, etc.

Let's confirm it works for a single date:

In [1]: import dateparser
        dateparser.parse('11 Janvier 2016 à 10:50')
Out[1]: datetime.datetime(2016, 1, 11, 10, 50)

Moving on to parsing this test_dates.csv file:

               Date  Value
0    7 janvier 1983     10
1  21 décembre 1986     21
2    1 janvier 2016     12

You can actually use dateparser.parse as the parser:

In [2]: df = pd.read_csv('test_dates.csv',
                         parse_dates=['Date'], date_parser=dateparser.parse)
        print(df)

Out [2]:
        Date  Value
0 1983-01-07     10
1 1986-12-21     21
2 2016-01-01     12

Obviously if you need to do that after having already loaded the dataframe, you can always use apply, or map:

# Using apply (6.22 ms per loop)
df.Date = df.Date.apply(lambda x: dateparser.parse(x))

# Or map which is slightly slower (7.75 ms per loop)
df.Date = df.Date.map(dateparser.parse)
Julien Marrec
  • 11,605
  • 4
  • 46
  • 63
1

it also works correctly if you set the appropriate locale and set the format for parsing:

import locale
locale.setlocale(locale.LC_ALL, 'fr_FR')

import pandas as pd
pd.to_datetime("11 Janvier 2016 à 10:50", format='%d %B %Y à %H:%M')
# Timestamp('2016-01-11 10:50:00')
FObersteiner
  • 22,500
  • 8
  • 42
  • 72