2

While doing some data cleaning, I noticed that dateutil.parser.parse failed to reject a certain malformed date, thinking that the first number in it is a two digit year. Can this library be forced to treat two digit years as invalid?

Example:

from dateutil.parser import parse
parse('22-23 February')

outputs:

datetime.datetime(2022, 2, 23, 0, 0)
Mihai Todor
  • 8,014
  • 9
  • 49
  • 86

1 Answers1

3

I managed to work around this by passing a custom dateutil.parser.parserinfo object via the parserinfo parameter to dateutil.parser.parse. Luckily, dateutil.parser.parserinfo has a convertyear method that can be overloaded in a derived class in order to perform extra validations on the year.

from dateutil.parser import parse, parserinfo

class NoTwoDigitYearParserInfo(parserinfo):
    def convertyear(self, year, century_specified=False):
        if year < 100 and not century_specified:
            raise ValueError('Two digit years are not supported.')
        return parserinfo.convertyear(self, year, century_specified)

parse('22-23 February', parserinfo = NoTwoDigitYearParserInfo())

outputs:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/dateutil/parser.py", line 1162, in parse
    return parser(parserinfo).parse(timestr, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/dateutil/parser.py", line 552, in parse
    res, skipped_tokens = self._parse(timestr, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/dateutil/parser.py", line 1055, in _parse
    if not info.validate(res):
  File "/usr/local/lib/python3.5/site-packages/dateutil/parser.py", line 360, in validate
    res.year = self.convertyear(res.year, res.century_specified)
  File "<stdin>", line 4, in convertyear
ValueError: Two digit years are not supported.
Mihai Todor
  • 8,014
  • 9
  • 49
  • 86
  • 1
    Nice. I used this to prevent the year defaulting to current if no year found. – matt_s Sep 19 '16 at 16:22
  • You're welcome! Another option is to pass `default = datetime(1900, 1, 1)` to `dateutil.parser.parse` (or whichever year you wish) and handle the dates with the default year afterwards. – Mihai Todor Sep 19 '16 at 16:53
  • 1
    I actually overrode the `validate` method, `if res.year is None: return False, else return super...`, seemed neat. – matt_s Sep 20 '16 at 09:39