7

I was trying to convert a string to a datetime object.

The string I got from a news feed is in the following format:

Thu, 16 Oct 2014 01:16:17 EDT"

I tried using datetime.strptime() to convert it.

i.e.,

datetime.strptime('Thu, 16 Oct 2014 01:16:17 EDT','%a, %d %b %Y %H:%M:%S %Z')

And got the following error:

Traceback (most recent call last):
  File "", line 1, in datetime.strptime('Thu, 16 Oct 2014 01:16:17 EDT','%a, %d %b %Y %H:%M:%S %Z')
  File "C:\Anaconda\lib_strptime.py", line 325, in _strptime (data_string, format))
ValueError: time data 'Thu, 16 Oct 2014 01:16:17 EDT' does not match format '%a, %d %b %Y %H:%M:%S %Z'

However, if I tried the string without "EDT", it worked.

i.e.,

datetime.strptime('Thu, 16 Oct 2014 01:16:17','%a, %d %b %Y %H:%M:%S')

Does anyone know how to parse that "EDT" part?

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
Victor Gau
  • 191
  • 2
  • 11
  • 1
    related: [Python: parsing date with timezone from an email](http://stackoverflow.com/q/1790795/4279). – jfs Oct 18 '14 at 02:21
  • related: [How do I parse an HTTP date-string in Python?](http://stackoverflow.com/q/1471987/4279) – jfs Oct 22 '14 at 04:25

2 Answers2

10

To parse the date in RFC 2822 format, you could use email package:

from datetime import datetime, timedelta
from email.utils import parsedate_tz, mktime_tz

timestamp = mktime_tz(parsedate_tz("Thu, 16 Oct 2014 01:16:17 EDT"))
# -> 1413436577
utc_dt = datetime(1970, 1, 1) + timedelta(seconds=timestamp)
# -> datetime.datetime(2014, 10, 16, 5, 16, 17)

Note: parsedate_tz() assumes that EDT corresponds to -0400 UTC offset but it might be incorrect in Australia where EDT is +1100 (AEDT is used by pytz in this case) i.e., a timezone abbreviation may be ambiguous. See Parsing date/time string with timezone abbreviated name in Python?

Related Python bug: %Z in strptime doesn't match EST and others.

If your computer uses POSIX timestamps (likely), and you are sure the input date is within an acceptable range for your system (not too far into the future/past), and you don't need to preserve the microsecond precision then you could use datetime.utcfromtimestamp:

from datetime import datetime
from email.utils import parsedate_tz, mktime_tz

timestamp = mktime_tz(parsedate_tz("Thu, 16 Oct 2014 01:16:17 EDT"))
# -> 1413436577
utc_dt = datetime.utcfromtimestamp(timestamp)
# -> datetime.datetime(2014, 10, 16, 5, 16, 17)
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • @user2629723 if the answer helped solve your problem, please indicate as such by selecting the check mark next to it. Doing this shows that your issue has been resolved, and also awards reputation to both you and the answerer. See "[How does accepting an answer work?](http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work)" for more info. – MattDMo Oct 18 '14 at 03:49
  • 1
    from python3.3 onwards you can use https://docs.python.org/3.6/library/datetime.html#datetime.datetime.utcfromtimestamp so you don't need to add the offset to make it a little easier – amohr Jun 25 '18 at 19:43
  • 1
    @amohr 1- utcfromtimestamp() is available forever (long before Python 3.3) 2- datetime + timedelta may have a more portable date range and it shows the relationship between POSIX timestamp and UTC time explicitly. Though in most case, both methods are interchangeable and the usage of utcfromtimestamp() has less moving parts (and therefore it is more preferable). – jfs Jun 26 '18 at 09:11
  • @amohr, datetime.utcfromtimestamp is available in 2.7. It just changed behaviour a little in 3.3. – Javier Nov 14 '18 at 21:37
  • Thanks @jfs for the caveats! – Javier Nov 15 '18 at 22:17
1

The email.utils.parsedate_tz() solution is good for 3-letter timezones but it does not work for 4 letters such as AEDT or CEST. If you need a mix, the answer under Parsing date/time string with timezone abbreviated name in Python? works for both with the most commonly used time zones.

DrDaveD
  • 11
  • 1