3

I have a script that is parsing out fields within email headers that represent dates and times. Some examples of these strings are as follows:

Fri, 10 Jun 2011 11:04:17 +0200 (CEST)
Tue, 1 Jun 2011 11:04:17 +0200
Wed, 8 Jul 1992 4:23:11 -0200
Wed, 8 Jul 1992 4:23:11 -0200 EST

Before I was confronted with the CEST/EST portions at the ends of some the strings I had things working pretty well just using datetime.datetime.strptime like this:

msg['date'] = 'Wed, 8 Jul 1992 4:23:11 -0200'
mail_date = datetime.datetime.strptime(msg['date'][:-6], '%a, %d %b %Y %H:%M:%S')

I tried to put a regex together to match the date portions of the string while excluding the timezone information at the end, but I was having issues with the regex (I couldn't match a colon).

Is using a regex the best way to parse all of the examples above? If so, could someone share a regex that would match these examples? In the end I am looking to have a datetime object.

ajt
  • 1,341
  • 3
  • 13
  • 30

2 Answers2

7

From python time to age part 2, timezones:

from email import utils
utils.parsedate_tz('Fri, 10 Jun 2011 11:04:17 +0200 (CEST)') 
utils.parsedate_tz('Fri, 10 Jun 2011 11:04:17 +0200')
utils.parsedate_tz('Fri, 10 Jun 2011 11:04:17')

The output is:

(2011, 6, 10, 11, 4, 17, 0, 1, -1, 7200)
(2011, 6, 10, 11, 4, 17, 0, 1, -1, 7200)
(2011, 6, 10, 11, 4, 17, 0, 1, -1, None)
Community
  • 1
  • 1
Ruggiero Spearman
  • 6,735
  • 5
  • 26
  • 37
  • I saw that the old rfc822 module had similar functionality but I was not aware of email.utils. Thank you. – ajt Jun 10 '11 at 14:31
2

Perhaps I misunderstood your question, but wont a simple split suffice?

#!/usr/bin/python

d = ["Fri, 10 Jun 2011 11:04:17 +0200 (CEST)", "Tue, 1 Jun 2011 11:04:17 +0200", 
     "Wed, 8 Jul 1992 4:23:11 -0200", "Wed, 8 Jul 1992 4:23:11 -0200 EST"]

for i in d:
    print " ".join(i.split()[0:5])


Fri, 10 Jun 2011 11:04:17
Tue, 1 Jun 2011 11:04:17
Wed, 8 Jul 1992 4:23:11
Wed, 8 Jul 1992 4:23:11
Fredrik Pihl
  • 44,604
  • 7
  • 83
  • 130