52

Is there an easy way to parse HTTP date-strings in Python? According to the standard, there are several ways to format HTTP date strings; the method should be able to handle this.

In other words, I want to convert a string like "Wed, 23 Sep 2009 22:15:29 GMT" to a python time-structure.

Sridhar Ratnakumar
  • 81,433
  • 63
  • 146
  • 187
Troels Arvin
  • 6,238
  • 2
  • 24
  • 27

4 Answers4

61
>>> import email.utils, datetime
>>> email.utils.parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, -1)

If you want a datetime.datetime object, you can do:

# Python <3.3
def my_parsedate(text):
    return datetime.datetime(*eut.parsedate(text)[:6])

# Python ≥3.3
def my_parsedate(text):
    return email.utils.parsedate_to_datetime(text)

email.utils.parsedate

Attempts to parse a date according to the rules in RFC 2822. however, some mailers don’t follow that format as specified, so parsedate() tries to guess correctly in such cases. date is a string containing an RFC 2822 date, such as "Mon, 20 Nov 1995 19:12:08 -0500". If it succeeds in parsing the date, parsedate() returns a 9-tuple that can be passed directly to time.mktime(); otherwise None will be returned. Note that indexes 6, 7, and 8 of the result tuple are not usable.

email.utils.parsedate_to_datetime

The inverse of format_datetime(). Performs the same function as parsedate(), but on success returns a datetime; otherwise ValueError is raised if date contains an invalid value such as an hour greater than 23 or a timezone offset not between -24 and 24 hours. If the input date has a timezone of -0000, the datetime will be a naive datetime, and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the datetime will be an aware datetime with the corresponding a timezone tzinfo.

tzot
  • 92,761
  • 29
  • 141
  • 204
  • 5
    Yep, parsedate's probably the best compromise, though its "tolerant RFC 2822 parsing" is not 100% compatible with RFC 2616'2 demanding "MUST" -- e.g., epic fail on RFC 850 format with two-digit years, such as `Sunday, 06-Nov-94 08:49:37 GMT`, yet 2616 says a client MUST be able to parse RFC 850 dates (sigh). – Alex Martelli Sep 24 '09 at 15:19
  • email.Utils.parsedate seems sufficient, thanks. But it's confusing that it's sometimes called email.utils, and sometimes email.Utils. I guess that the email.Utils version is an old legacy variant which has been deprecated(?) – Troels Arvin Sep 24 '09 at 20:43
  • 1
    `email.utils.parsedate is email.Utils.parsedate -> True` It seems that *U*tils is a lazy loader. – jfs Sep 24 '09 at 22:24
  • 3
    Also note that email.util.parsedate() returns a tuple that can be passed directly to time.mktime() (this gives you a int of seconds from the epoch on your computer(local time, not UTC)). – driax Jun 15 '10 at 04:00
  • 2
    @driax: seconds since the Epoch doesn't depend on local timezone e.g., `0` means `1970-01-01T00:00:00Z` -- it is the same time instance around the world (local clock shows different values but the timestamp is exactly the same). Unless input timestring is in UTC (GMT); you should [use `mktime_tz(parsedate_tz())` instead](http://stackoverflow.com/a/26435566/4279) -- otherwise the info about the timezone is lost. – jfs Oct 22 '14 at 04:19
  • @J.F.Sebastian you're absolutely right. Not sure what I was trying say with "local time". I was probably frustrated that I hadn't found a mktime_tz function (what the heck is that doing in email.utils). Oh well :) – driax Oct 23 '14 at 01:32
  • 8
    In more recent versions of python you can use `email.utils.parsedate_to_datetime` – mgilbert Oct 19 '18 at 17:30
  • 1
    Let's keep code readable. If I found 'eut' in code, I'd have to dig around to find out what it is. I suggest you simply do `from email.utils import parsedate` (or, now that I've read the previous comment by @mgilbert `from email.utils import parsedate_to_datetime`). – Michael Scheper Oct 23 '18 at 20:21
  • 1
    Also see https://stackoverflow.com/a/8339750/14558 for the version that includes timezone parsing – andrewdotn Mar 07 '19 at 18:09
  • Quite concerning that this has so many upvotes despite completely ignoring the fact it ignores the timezone. Not all http servers / clients will use `GMT`. Considder using [this answer](https://stackoverflow.com/a/59416334/453851) instead! – Philip Couling Nov 01 '22 at 10:04
16

Since Python 3.3 there's email.utils.parsedate_to_datetime which can parse RFC 5322 timestamps (aka IMF-fixdate, Internet Message Format fixed length format, a subset of HTTP-date of RFC 7231).

>>> from email.utils import parsedate_to_datetime
... 
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... parsedate_to_datetime(s)
0: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)

There's also undocumented http.cookiejar.http2time which can achieve the same as follows:

>>> from datetime import datetime, timezone
... from http.cookiejar import http2time
... 
... s = 'Sun, 06 Nov 1994 08:49:37 GMT'
... datetime.utcfromtimestamp(http2time(s)).replace(tzinfo=timezone.utc)
1: datetime.datetime(1994, 11, 6, 8, 49, 37, tzinfo=datetime.timezone.utc)

It was introduced in Python 2.4 as cookielib.http2time for dealing with Cookie Expires directive which is expressed in the same format.

Community
  • 1
  • 1
saaj
  • 23,253
  • 3
  • 104
  • 105
8
>>> import datetime
>>> datetime.datetime.strptime('Wed, 23 Sep 2009 22:15:29 GMT', '%a, %d %b %Y %H:%M:%S GMT')
datetime.datetime(2009, 9, 23, 22, 15, 29)
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
  • yes, and it's fairly easy to extend to handle any format. while `email.utils.parse` is more robust, it's less transparent as well. – SilentGhost Sep 24 '09 at 16:42
  • 1
    +1 and thanks. because it said to avoid such comments. much clearer than "utils"-named modules – user237419 Feb 19 '14 at 17:54
2
httplib.HTTPMessage(filehandle).getdate(headername)
httplib.HTTPMessage(filehandle).getdate_tz(headername)
mimetools.Message(filehandle).getdate()
rfc822.parsedate(datestr)
rfc822.parsedate_tz(datestr)
  • if you have a raw data stream, you can build an HTTPMessage or a mimetools.Message from it. it may offer additional help while querying the response object for infos
  • if you are using urllib2, you already have an HTTPMessage object hidden in the filehandler returned by urlopen
  • it can probably parse many date formats
  • httplib is in the core

NOTE:

  • had a look at implementation, HTTPMessage inherits from mimetools.Message which inherits from rfc822.Message. two floating defs are of your interest maybe, parsedate and parsedate_tz (in the latter)
  • parsedate(_tz) from email.utils has a different implementation, although it looks kind of the same.

you can do this, if you only have that piece of string and you want to parse it:

>>> from rfc822 import parsedate, parsedate_tz
>>> parsedate('Wed, 23 Sep 2009 22:15:29 GMT')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)
>>> 

but let me exemplify through mime messages:

import mimetools
import StringIO
message = mimetools.Message(
    StringIO.StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> m
<mimetools.Message instance at 0x7fc259146710>
>>> m.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)

or via http messages (responses)

>>> from httplib import HTTPMessage
>>> from StringIO import StringIO
>>> http_response = HTTPMessage(StringIO('Date:Wed, 23 Sep 2009 22:15:29 GMT\r\n\r\n'))
>>> #http_response can be grabbed via urllib2.urlopen(url).info(), right?
>>> http_response.getdate('Date')
(2009, 9, 23, 22, 15, 29, 0, 1, 0)

right?

>>> import urllib2
>>> urllib2.urlopen('https://fw.io/').info().getdate('Date')
(2014, 2, 19, 18, 53, 26, 0, 1, 0)

there, now we now more about date formats, mime messages, mime tools and their pythonic implementation ;-)

whatever the case, looks better than using email.utils for parsing http headers.

user237419
  • 8,829
  • 4
  • 31
  • 38
  • 2
    Seems at now(Dec. 2016) rfc 822 is deprecated, the email package is a prefered approach per the document. https://docs.python.org/2/library/rfc822.html – StanleyZ Dec 29 '16 at 03:24