15

I'm trying to validate a string that's supposed to contain a timestamp in the format of ISO 8601 (commonly used in JSON).

Python's strptime seems to be very forgiving when it comes to validating zero-padding, see code example below (note that the hour is missing a leading zero):

>>> import datetime
>>> s = '1985-08-23T3:00:00.000'
>>> datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f')
datetime.datetime(1985, 8, 23, 3, 0)

It gracefully accepts a string that's not zero-padded for the hour for example, and doesn't throw a ValueError exception as I would expect.

Is there any way to enforce strptime to validate that it's zero-padded? Or is there any other built-in function in the standard libs of Python that does?

I would like to avoid writing my own regexp for this.

sophros
  • 14,672
  • 11
  • 46
  • 75
Niklas9
  • 8,816
  • 8
  • 37
  • 60
  • 1
    You could just validate the string manually: check that the `.` is in the proper position (`str[19] == '.'`): if it's not, then there is an issue with zero padding. – TemporalWolf Aug 09 '17 at 16:56
  • Perhaps not this specific question but other matters to do with ISO8601 have been discussed on SO. One question mentions https://pypi.python.org/pypi/iso8601 which in turn mentions http://labix.org/python-dateutil. – Bill Bell Aug 09 '17 at 17:16

4 Answers4

5

There is already an answer that parsing ISO8601 or RFC3339 date/time with Python strptime() is impossible: How to parse an ISO 8601-formatted date? So, to answer you question, no there is no way in the standard Python library to reliable parse such a date. Regarding the regex suggestions, a date string like

2020-14-32T45:33:44.123

would result in a valid date. There are lots of Python modules (if you search for "iso8601" on https://pypi.python.org), but building a complete ISO8601 Validator would require things like leap seconds, the list of possible time zone offset values and many more.

Arminius
  • 1,029
  • 7
  • 11
1

To enforce strptime to validate leading zeros for you you'll have to add your own literals to Python's _strptime._TimeRE_cache. The solution is very hacky, most likely not very portable, and requires writing a RegEx - although only for the hour part of a timestamp.

Another solution to the problem would be to write your own function that uses strptime and also converts the parsed date back to a string and compares the two strings. This solution is portable, but it lacks for the clear error messages - you won't be able to distinguish between missing leading zeros in hours, minutes, seconds.

Eugene Pakhomov
  • 9,309
  • 3
  • 27
  • 53
  • The second solution is a neat. Simple implementation : `if datetime.datetime.strptime(mydate, "%Y-%m-%d").strftime("%Y-%m-%d") != mydate: raise Exception(f"Invalid date : {mydate} (YYYY-MM-DD format expected).")`. – Skippy le Grand Gourou Jun 07 '22 at 12:52
1

You said you want to avoid a regex, but this is actually the type of problem where a regex is appropriate. As you discovered, strptime is very flexible about the input it will accept. However, the regex for this problem is relatively easy to compose:

import re

date_pattern = re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}')
s_list = [
    '1985-08-23T3:00:00.000',
    '1985-08-23T03:00:00.000'
]
for s in s_list:
    if date_pattern.match(s):
        print "%s is valid" % s
    else:
        print "%s is invalid" % s

Output

1985-08-23T3:00:00.000 is invalid
1985-08-23T03:00:00.000 is valid

Try it on repl.it

  • You need to anchor the regex to the beginning and end of the string. Otherwise something like '1985-08-23T3:00:00.000jcgriuvrvrvbahv' would also match – Satyan Raina Feb 07 '20 at 14:03
0

The only thing I can think of outside of messing with Python internals is to test for the validity of the format by knowing what you are looking for.

So, if I garner it right, the format is '%Y-%m-%dT%H:%M:%S.%f' and should be zero padded. Then, you know the exact length of the string you are looking for and reproduce the intended result..

import datetime
s = '1985-08-23T3:00:00.000'

stripped = datetime.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f') 
try:
    assert len(s) == 23
except AssertionError:
    raise ValueError("time data '{}' does not match format '%Y-%m-%dT%H:%M:%S.%f".format(s))
else:
    print(stripped) #just for good measure

>>ValueError: time data '1985-08-23T3:00:00.000' does not match format '%Y-%m-%dT%H:%M:%S.%f
Uvar
  • 3,372
  • 12
  • 25
  • With this approach you must be careful to `strip()` your input, or a trailing newline might give you a false good value. – TemporalWolf Aug 09 '17 at 17:28
  • Then you will run into `ValueError: unconverted data remains: ` by merit of the strptime..or am I missing something here? – Uvar Aug 09 '17 at 17:32