0

I'm having problems with decoding emails that that I'm fetching.

The script should log on to an email account, get the unread messages and then later on store them in a database. I only want the actual text from the email but none of the html stuff.

I have found many examples but none of the seems to work. I have tried this and this and some more I have found.

The Code I have now:

import imaplib, sys, email
import email.parser

myparser = email.parser.Parser()
conn = imaplib.IMAP4_SSL(host='mail.something.com')

retcode, capabilities = conn.login('username', 'XXXXX')

conn.select('Inbox', readonly = 1) # Select inbox as read-only
retcode, messages = conn.search(None, '(UNSEEN)')
if retcode == 'OK':
    for message in messages[0].split(' '):
        if message == '':
            continue
         ret, data = conn.fetch(message,'(RFC822)')
         msg = email.message_from_string(data[0][1])
#        rootMessage = myparser.parse(data[0][1])

#        print 'Message %s\n%s\n' % (message, rootMessage)
        print msg
        print '---------------------------------------------------------------' 

 conn.close()

As you can see there is no decoding in this because everything that I have tried has failed.

I am very new to python so if someone could steer me into the right direction I would really appreciate it. A hack would be all right, it not a mission critical script, but a generic solution would be best.

-G

UPDATE:

There is no error, the problem is that the output is not decoded correctly.

Example input:

 This is a test message.

 Gísli

Output:

 This is a test message.

 G=EDsli
Community
  • 1
  • 1
Gisli
  • 734
  • 2
  • 11
  • 34

1 Answers1

1

This can help:

import quopri

print quopri.decodestring(msg).decode('utf8')

Or this:

import base64  

body = base64.b64decode(msg) 
stalk
  • 11,934
  • 4
  • 36
  • 58
  • I get an error if I try this: UnicodeDecodeError: 'utf8' codec can't decode bytes in position 222-227: unsupported Unicode code range . Any ideas? – Gisli May 25 '12 at 13:59
  • 1
    Try to remove `.decode('utf8')` or replace `'utf8'` with your email message coding – stalk May 25 '12 at 14:45
  • Thanks. This is getting me somewhere. I had forgot to change the encoding to iso-8859-1 . – Gisli May 25 '12 at 14:49