9

I am trying to parse date time of an email using python script.

In mail date value is like below when i am opening mail detils...

from:    abcd@xyz.com
to:      def@xyz.com
date:    Tue, Aug 28, 2012 at 1:19 PM
subject: Subject of that mail

I am using code like

mail = email.message_from_string(str1)
#to = re.sub('</br>','',mail["To"])
to = parseaddr(mail.get('To'))[1]
sender = parseaddr(mail.get('From'))[1]
cc_is = parseaddr(mail.get('Cc'))[1]
date = mail["Date"]
print date

Where as output of the same mails datetime using python parsing is like below with time offset.

Tue, 28 Aug 2012 02:49:13 -0500

Where I am Actually hoping for

Tue, Aug 28, 2012 at 1:19 PM

I am so confused between relationship of this two values. Can anybody help me to figure it out I need to get the same time into mail details.

chirag ghiyad
  • 690
  • 1
  • 6
  • 14

3 Answers3

26

When looking at an email in GMail, your local timezone is used when displaying the date and time an email was sent. The "Tue, 28 Aug 2012 02:49:13 -0500" is parsed, then updated to your local timezone, and formatted in a GMail-specific manner.

Parsing and formatting the stdlib way

The email.utils module includes a parsedate_tz() function that specifically deals with email headers with timezone offsets.

It returns a tuple compatible with time.struct_time, but with a timezone offset added. An additional mktime_tz() function converts that tuple to an offset value (time in seconds since the UNIX epoch). This value then can be converted to a datetime.datetime() type object easily.

The same module also has a formatdate() function to convert the UNIX epoch timestamp to a email-compatible date string:

>>> from email.utils import parsedate_tz, mktime_tz, formatdate
>>> import time
>>> date = 'Tue, 28 Aug 2012 02:49:13 -0500'
>>> tt = parsedate_tz(date)
>>> timestamp = mktime_tz(tt)
>>> print formatdate(timestamp)
Tue, 28 Aug 2012 07:49:13 -0000

Now we have a formatted date in UTC suitable for outgoing emails. To have this printed as my local timezone (as determined by my computer) you need to set the localtime flag to True:

>>> print formatdate(timestamp, True)
Tue, 28 Aug 2012 08:49:13 +0100

Parsing and formatting using better tools

Note that things are getting hairy as we try and deal with timezones, and the formatdate() function doesn't give you any options to format things a little differently (like GMail does), nor does it let you choose a different timezone to work with.

Enter the external python-dateutil module; it has a parse function that can handle just about anything, and supports timezones properly

>>> import dateutil.parser
>>> dt = dateutil.parser.parse(date)
>>> dt
datetime.datetime(2012, 8, 28, 2, 49, 13, tzinfo=tzoffset(None, -18000))

The parse() function returns a datetime.datetime() instance, which makes formatting a lot easier. Now we can use the .strftime() function to output this as your email client does:

>>> print dt.strftime('%a, %b %d, %Y at %I:%M %p')
Tue, Aug 28, 2012 at 02:49 AM

That's still in the local timezone, of course; to cast this to your timezone instead, use the .astimezone() method, with a new tzone object. The python-dateutil package has some handy for us.

Here is how you print it in the local timezone (to your machine):

>>> import dateutil.tz
>>> print dt.astimezone(dateutil.tz.tzlocal()).strftime('%a, %b %d, %Y at %I:%M %p')
Tue, Aug 28, 2012 at 09:49 AM

or use a specific timezone instead:

>>> print dt.astimezone(dateutil.tz.tzstr('Asia/Kolkata')).strftime('%a, %b %d, %Y at %I:%M %p')
Tue, Aug 28, 2012 at 07:49 AM
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Actually I am asking for output of parsed mail which I can capture...like CC,TO or SENDER.....you guided me to format that datetime string...to which I am comfortable enough. – chirag ghiyad Aug 28 '12 at 13:27
  • Right, you are confused about the timezones I think; the email date is parsed as one timezone, displayed in another. This is usually not a problem. – Martijn Pieters Aug 28 '12 at 13:28
  • Note that your question is far from clear; it is not clear where you see the values and what you expected. – Martijn Pieters Aug 28 '12 at 13:32
  • @Martijin : now is it clear?? Actually I am not able to get the string "Tue, Aug 28, 2012 at 1:19 PM"..which i can see in my mail when I opens it in my browser in my gmail acc. – chirag ghiyad Aug 28 '12 at 13:33
  • 13:19:21 Tuesday August 28, 2012 in Asia/Kolkata converts to 02:49:21 Tuesday August 28, 2012 in GMT-5...so this is my final answer..thanks @Martijin for your help. – chirag ghiyad Aug 28 '12 at 14:19
  • there are parsedate_tz and mktime_tz i.e., you don't need to parse the UTC offset by hand – jfs Apr 17 '14 at 00:08
  • @J.F.Sebastian: thanks; not sure how I missed those at the time. It was not as if those functions have been added after I posted this.. :-/ – Martijn Pieters Apr 17 '14 at 12:02
8

You could do it using only stdlib:

>>> from email.utils import parsedate_tz, mktime_tz, formatdate
>>> ts = mktime_tz(parsedate_tz('Tue, 28 Aug 2012 02:49:13 -0500'))
>>> formatdate(ts, localtime=True) # assuming Asia/Kolkata is the local timezone
'Tue, 28 Aug 2012 13:19:13 +0530'

If you want to use PM format for hours:

>>> from datetime import datetime
>>> datetime.fromtimestamp(ts).strftime('%a, %b %d, %Y at %I:%M %p')
'Tue, Aug 28, 2012 at 01:19 PM'
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • J.F. Sebastian is correct just one typo instead of datetime he needs datetime.datetime example should be such datetime.datetime.fromtimestamp(ts).strftime('%a, %b %d, %Y at %I:%M %p') – Dung Sep 22 '15 at 19:40
  • @Dung: the code works as is. Look at the import line. – jfs Sep 22 '15 at 20:10
  • In addition, if you want a date format to match mysql datetime format here it is: >>> datetime.fromtimestamp(ts).strftime('%y-%m-%d %H:%M:%S') – Dung Sep 22 '15 at 21:20
  • @Dung: again, the code works as is. It produces the time format that OP requested *explicitely*. Look at the question. – jfs Sep 22 '15 at 21:24
2

one could opt for the following code

start = f.find('date:') + 5  # +5 is to exclude 'date'+':' i.e.(4+1=5)
end = f.find('subject:', start) # parse from date to subject 
date_time = f[start:end]
print date_time #it will print "Tue, Aug 28, 2012 at 1:19 PM"