I have a script that is parsing out fields within email headers that represent dates and times. Some examples of these strings are as follows:
Fri, 10 Jun 2011 11:04:17 +0200 (CEST)
Tue, 1 Jun 2011 11:04:17 +0200
Wed, 8 Jul 1992 4:23:11 -0200
Wed, 8 Jul 1992 4:23:11 -0200 EST
Before I was confronted with the CEST/EST portions at the ends of some the strings I had things working pretty well just using datetime.datetime.strptime
like this:
msg['date'] = 'Wed, 8 Jul 1992 4:23:11 -0200'
mail_date = datetime.datetime.strptime(msg['date'][:-6], '%a, %d %b %Y %H:%M:%S')
I tried to put a regex together to match the date portions of the string while excluding the timezone information at the end, but I was having issues with the regex (I couldn't match a colon).
Is using a regex the best way to parse all of the examples above? If so, could someone share a regex that would match these examples? In the end I am looking to have a datetime object.