12

Dateutil is a great tool for parsing dates in string format. for example

from dateutil.parser import parse
parse("Tue, 01 Oct 2013 14:26:00 -0300")

returns

datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))

however,

parse("Ter, 01 Out 2013 14:26:00 -0300") # In portuguese

yields this error:

ValueError: unknown string format

Does anybody know how to make dateutil aware of the locale?

alexwlchan
  • 5,699
  • 7
  • 38
  • 49
fccoelho
  • 6,012
  • 10
  • 55
  • 67
  • 2
    There is [this project](https://code.google.com/p/date-parser/) that adds language support to parsing with `dateutil`. I don't see Portuguese support in there, though. – Martijn Pieters Nov 12 '13 at 11:16
  • Related: http://stackoverflow.com/questions/8896038/how-to-use-python-dateutil-1-5-parse-function-to-work-with-unicode – Martijn Pieters Nov 12 '13 at 11:17
  • related: [Python strptime finnish](http://stackoverflow.com/q/33375709/4279) – jfs Mar 19 '16 at 16:42

5 Answers5

5

As far as I can see, dateutil is not locale aware (yet!).

I can think of three alternative suggestions:

  • The day and month names are hardcoded in dateutil.parser (as part of the parserinfo class). You could subclass parserinfo, and replace these names with the appropriate names for Portuguese.

  • Modify dateutil to get day and month names based on the user’s locale. So you could do something like

    import locale
    locale.setlocale(locale.LC_ALL, "pt_PT")
    
    from dateutil.parser import parse
    parse("Ter, 01 Out 2013 14:26:00 -0300")
    

    I’ve started a fork which gets the names from the calendar module (which is locale-aware) to work on this: https://github.com/alexwlchan/dateutil

    Right now it works for Portuguese (or seems to), but I want to think about it a bit more before I submit a patch to the main branch. In particular, weirdness may happen if it faces characters which aren’t used in Western European languages. I haven’t tested this yet. (See https://stackoverflow.com/a/8917539/1558022)

  • If you’re not tied to the dateutil module, you could use datetime instead, which is already locale-aware:

    from datetime import datetime, date
    import locale
    
    locale.setlocale(locale.LC_ALL, "pt_PT")
    datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
                      "%a, %d %b %Y %H:%M:%S %z")
    

    (Note that the %z token is not consistently supported in datetime.)

Community
  • 1
  • 1
alexwlchan
  • 5,699
  • 7
  • 38
  • 49
4

You could use PyICU to parse a localized date/time string in a given format:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from datetime import datetime
import icu  # PyICU

df = icu.SimpleDateFormat(
               'EEE, dd MMM yyyy HH:mm:ss zzz', icu.Locale('pt_BR'))
ts = df.parse(u'Ter, 01 Out 2013 14:26:00 -0300')
print(datetime.utcfromtimestamp(ts))
# -> 2013-10-01 17:26:00 (UTC)

It works on Python 2/3. It does not modify global state (locale).

If your actual input time string does not contain the explicit utc offset then you should specify a timezone to be used by ICU explicitly otherwise you can get a wrong result (ICU and datetime may use different timezone definitions).

If you only need to support Python 3 and you don't mind setting the locale then you could use datetime.strptime() as @alexwlchan suggested:

#!/usr/bin/env python3
import locale
from datetime import datetime

locale.setlocale(locale.LC_TIME, "pt_PT.UTF-8")
print(datetime.strptime("Ter, 01 Out 2013 14:26:00 -0300",
                        "%a, %d %b %Y %H:%M:%S %z")) # works on Python 3.2+
# -> 2013-10-01 14:26:00-03:00
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
2

The calendar module already has constants for a lot of of languages. I think the best solution is to customize the parser from dateutil using these constants. This is a simple solution and will work for a lot of languages. I didn't test it a lot, so use with caution.

Create a module localeparseinfo.py and subclass parser.parseinfo:

import calendar
from dateutil import parser
    
class LocaleParserInfo(parser.parserinfo):
    WEEKDAYS = zip(calendar.day_abbr, calendar.day_name)
    MONTHS = list(zip(calendar.month_abbr, calendar.month_name))[1:]

Now you can use your new parseinfo object as a parameter to dateutil.parser.

In [1]: import locale;locale.setlocale(locale.LC_ALL, "pt_BR.utf8")
In [2]: from localeparserinfo import LocaleParserInfo                                   

In [3]: from dateutil.parser import parse                                                

In [4]: parse("Ter, 01 Out 2013 14:26:00 -0300", parserinfo=PtParserInfo())              
Out[4]: datetime.datetime(2013, 10, 1, 14, 26, tzinfo=tzoffset(None, -10800))

It solved my problem, but note that this is an incomplete solution for all possible dates and times. Take a look at dateutil parser.py, specially the parserinfo class variables. Take a look at HMS variable and others. You'll probably be able to use other constants from the calendar module.

You can even pass the locale string as an argument to your parserinfo class.

neves
  • 33,186
  • 27
  • 159
  • 192
1

One could use a context manger to temporarily set the locale and return a custom parserinfo object

Context Manager definition:

import calendar
import contextlib
import locale
from dateutil import parser


@contextlib.contextmanager
def locale_parser_info(localename):
    old_locale = locale.getlocale(locale.LC_TIME)
    locale.setlocale(locale.LC_TIME, localename)

    class InnerParserInfo(parser.parserinfo):
        WEEKDAYS = zip(calendar.day_abbr, calendar.day_name)
        # dots in abbreviation make dateutil raise a Parser Error exception
        MONTHS = list(zip([abr.replace(".", "") for abr in calendar.month_abbr], calendar.month_name))[1:]

    try:
        yield InnerParserInfo()
    finally:
        # Restore original locale
        locale.setlocale(locale.LC_TIME, old_locale)

The actual function just wraps the call to dateutil.parser.parse in the context manager we just defined, and uses the returned parserinfo object.

def parse_localized(datestr, date_locale="pt_PT"):
    with locale_parser_info(date_locale) as parserinfo:
        return parser.parse(datestr, parserinfo=parserinfo)

0
from dateutil.parser import parse
parse("Ter, 01 Out 2013 14:26:00 -0300",fuzzy=True)

Result:

datetime.datetime(2013, 1, 28, 14, 26, tzinfo=tzoffset(None, -10800))
Konstantin Glukhov
  • 1,898
  • 3
  • 18
  • 25