4

In Python3, I receive the following error message:

ValueError: time data '\u200e07-30-200702:38 PM' does not match format '%m-%d-%Y%I:%M %p'

from datetime import datetime

dateRegistered = '\u200e07-30-200702:38 PM'
# dateRegistered = '07-30-200702:38 PM'
dateRegistered = datetime.strptime(dateRegistered, '%m-%d-%Y%I:%M %p')
print (dateRegistered)

The code above serves to replicate the issue. It works if I uncomment the line. It seems the string I am receiving is encoded, but I could not find out which encoding it is using. Or do I have a non-printable character in my string?

print ('\u200e07-30-200702:38 PM')
>>>> 07-30-200702:38 PM
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
M24
  • 109
  • 1
  • 2
  • 11

1 Answers1

7

You have a U+200E LEFT-TO-RIGHT MARK character in your input. It's a non-printing typesetting directive, instructing anything that is displaying the text to switch to left-to-right mode. The string, when printed to a console that is already set to display from left-to-right (e.g. the vast majority of terminals in the western world), will not look any different from one printed without the marker.

Since it is not part of the date, you could just strip such characters:

datetime.strptime(dateRegistered.strip('\u200e'), '%m-%d-%Y%I:%M %p')

or if it is always present, explicitly add it to the format you are parsing, just like the - and : and space characters already part of your format:

datetime.strptime(dateRegistered, '\u200e%m-%d-%Y%I:%M %p')

Demo:

>>> from datetime import datetime
>>> dateRegistered = '\u200e07-30-200702:38 PM'
>>> datetime.strptime(dateRegistered.strip('\u200e'), '%m-%d-%Y%I:%M %p')
datetime.datetime(2007, 7, 30, 14, 38)
>>> datetime.strptime(dateRegistered, '\u200e%m-%d-%Y%I:%M %p')
datetime.datetime(2007, 7, 30, 14, 38)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Be careful, this might work wrongly if your date started with 20. ```dateRegistered = '\u200e2007-30-200702:38 PM'``` ```>>> dateRegistered.strip('\u200e')``` ```'7-30-200702:38 PM'``` It's safer to use ```dateRegistered.replace('\u200e', '')``` – Corvax Jul 01 '22 at 17:43
  • @Corvax: that's not how Python 3 string syntax works. `'\u200e'` is a **single character**. There are no `2` or `0` characters in that string, so they won't be stripped either. `dateRegistered = '\u200e2007-30-200702:38 PM'`, then `dateRegistered.strip('\u200e')` outputs `'2007-30-200702:38 PM'` – Martijn Pieters Jul 07 '22 at 12:33
  • Thanks, my comment was for older Python 2.7 in case if somebody is still using it. – Corvax Jul 08 '22 at 14:18