20

Can someone please tell me how can I parse a French date in Python? Sorry if the question is a duplicate but I couldn't find one.

Here is what I have tried using the dateutil parser:

import locale
from dateutil.parser import parse as parse_dt
locale.setlocale(locale.LC_TIME, 'fr_FR.UTF-8')   ## first I set locale
## locale.LC_TIME, 'fr_FR.UTF-8')
parse_dt('3 juillet',fuzzy= True)   ## don't work give the default month
## Out[29]: datetime.datetime(2014, 10, 3, 0, 0)
parse_dt(u'4 Août ',fuzzy= True)     ## same thing using another month 

Edit : add some context:

I am parsing dates without know in advance the format of my string. The idea is to parse many dates in fly :

parse_dt(u'Aujourd''hui ',fuzzy= True) 
parse_dt(u'Hier',fuzzy= True) 

Edit using another library :

Using parsedatime library and some regular expression to translate french words , I can get this:

import parsedatetime
import re 
cal = parsedatetime.Calendar()
cal.parse(re.sub('juil.*' ,'jul' ,'20 juillet'))
 ((2015, 7, 20, 10, 25, 47, 4, 283, 1), 1)

Maybe should I generalize this to all french months?

agstudy
  • 119,832
  • 17
  • 199
  • 261
  • In French, I have never read or heard expression of day and month in the order "month day", but always in the order "day month". Does it work with `'3 juillet'` or `u'3 Août'`? – Jean Hominal Oct 10 '14 at 07:52
  • @JeanHominal No. I just change the order for testing , I edit to set the french one. – agstudy Oct 10 '14 at 07:55
  • 1
    @agstudy: have you tried: `datetime.strptime(date_string, '%d %B')`? – jfs Oct 10 '14 at 07:57
  • 1
    After inspecting the `dateutil` source code, it appears that `dateutil` does not seem to support locale-dependent date parsing. – Jean Hominal Oct 10 '14 at 08:05
  • @J.F.Sebastian No because I don't know the exact format before parsing. – agstudy Oct 10 '14 at 08:06
  • @JeanHominal thank you. Maybe I will use some regular expression to replace french local term to English before using dateutil parser or use another python parser. – agstudy Oct 10 '14 at 08:09
  • 2
    i don't think you can parse anything properly without knowing what's the format. At most you will get hits and misses and i don't think that's what you want. most probably you've to sit down and and go through the data and get the most "catchable" formats and leave the rest alone... – alvas Oct 10 '14 at 08:18
  • @alvas what do you mean by properly? it is subjective, it depends , here if I parse 90% of my dates I will be happy. I think we should have a function that guess format specially when we have a vector of dates that share the same format. – agstudy Oct 10 '14 at 08:31
  • 1
    @agstudy: I've looked at [`parsedatetime` source code: it should be capable of parsing "today", "yesterday" in French if you define `pdtLocale_fr` class similar to `pdtLocale_de`](https://github.com/bear/parsedatetime/blob/master/parsedatetime/pdt_locales.py). [My attempt to define it based on `pdtLocale_icu` failed](https://gist.github.com/zed/11524fa26fa6882ad4d0) but it might work if you define the full locale without icu. – jfs Oct 10 '14 at 18:10
  • @J.F.Sebastian Thank you for your effort ! I will take a closer look at ICU. Just,FYI, I failed to install it using pip so I used something like `apt-get install python-pyicu`. – agstudy Oct 10 '14 at 18:50
  • @agstudy: [the gist](https://gist.github.com/zed/11524fa26fa6882ad4d0) works with [the github parsedatetime version](https://github.com/bear/parsedatetime). I've updated [the answer](http://stackoverflow.com/a/26295877/4279) – jfs Oct 11 '14 at 09:46

3 Answers3

25

dateparser module can parse dates in the question:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import dateparser # $ pip install dateparser

for date_string in [u"Aujourd'hui", "3 juillet", u"4 Août", u"Hier"]:
    print(dateparser.parse(date_string).date())

It translates dates to English using a simple yaml config and passes the date strings to dateutil.parser.

Output

2015-09-09
2015-07-03
2015-08-04
2015-09-08
PLNech
  • 3,087
  • 1
  • 23
  • 52
jfs
  • 399,953
  • 195
  • 994
  • 1,670
6

First check whether you have the correct locale in your repo:

$ locale -a
C
C.UTF-8
de_AT.utf8
de_BE.utf8
de_CH.utf8
de_DE.utf8
de_LI.utf8
de_LU.utf8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
POSIX

If not, do:

$ sudo locale-gen fr_FR.UTF-8
Generating locales...
  fr_FR.UTF-8... done
Generation complete.

Then go back to python:

$ python
>>> import locale
>>> import datetime
>>> locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
'fr_FR.UTF-8'
>>>
>>> date_txt = "Dimanche 3 Juin 2012"
>>> DATE_FORMAT = "%A %d %B %Y"
>>> datetime.datetime.strptime(date_txt, DATE_FORMAT)
datetime.datetime(2012, 6, 3, 0, 0)
>>>

To use customize date format:

>>> date_txt = "3 juillet"
>>> DATE_FORMAT = "%d %B"
>>> datetime.datetime.strptime(date_txt, DATE_FORMAT)
datetime.datetime(1900, 7, 3, 0, 0)

You'll realized that if the year is underspecified it's set to default at 1900.

Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
alvas
  • 115,346
  • 109
  • 446
  • 738
  • thanks. I forget to mention that I don't kn,ow the exact format before parsing. I also can get something like `Aujourd'hui` or `hier`. Looks like that I should do my personal dictionary translation. – agstudy Oct 10 '14 at 08:06
4
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import parsedatetime as pdt # $ pip install parsedatetime pyicu

calendar = pdt.Calendar(pdt.Constants(localeID='fr', usePyICU=True))
for date_string in [u"Aujourd'hui", "3 juillet", u"4 Août", u"Hier"]:
    dt, success = calendar.parseDT(date_string)
    if success:
       print(date_string, dt.date())

Output

3 juillet 2015-07-03
4 Août 2015-08-04

Aujourd'hui, Hier are not recognized (parsedatetime 1.4).

The current version on github (future 1.5) supports customizing the day offsets. It can be used to parse Aujourd'hui, Hier:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import parsedatetime as pdt

class pdtLocale_fr(pdt.pdt_locales.pdtLocale_icu):
    def __init__(self):
        super(pdtLocale_fr, self).__init__(localeID='fr_FR')
        self.dayOffsets.update({u"aujourd'hui": 0, u'demain': 1, u'hier': -1})

pdt.pdtLocales['fr_FR'] = pdtLocale_fr

calendar = pdt.Calendar(pdt.Constants(localeID='fr_FR', usePyICU=False))
for date_string in [u"Aujourd'hui", "3 juillet", u"4 Août", u"Hier",
                    u"au jour de hui", u"aujour-d’hui",
                    u"au-jour-d’hui", "demain", "hier",
                    u"today", "tomorrow", "yesterday"]:
    dt, rc = calendar.parseDT(date_string)
    if rc > 0:
       print(date_string, dt.date())

latest version

Output

Aujourd'hui 2014-10-11
3 juillet 2015-07-03
4 Août 2015-08-04
Hier 2014-10-10
demain 2014-10-12
hier 2014-10-10
today 2014-10-11
tomorrow 2014-10-12
yesterday 2014-10-10

To install it, run:

$ pip install git+https://github.com/bear/parsedatetime
jfs
  • 399,953
  • 195
  • 994
  • 1,670