I'm having problems with decoding emails that that I'm fetching.
The script should log on to an email account, get the unread messages and then later on store them in a database. I only want the actual text from the email but none of the html stuff.
I have found many examples but none of the seems to work. I have tried this and this and some more I have found.
The Code I have now:
import imaplib, sys, email
import email.parser
myparser = email.parser.Parser()
conn = imaplib.IMAP4_SSL(host='mail.something.com')
retcode, capabilities = conn.login('username', 'XXXXX')
conn.select('Inbox', readonly = 1) # Select inbox as read-only
retcode, messages = conn.search(None, '(UNSEEN)')
if retcode == 'OK':
for message in messages[0].split(' '):
if message == '':
continue
ret, data = conn.fetch(message,'(RFC822)')
msg = email.message_from_string(data[0][1])
# rootMessage = myparser.parse(data[0][1])
# print 'Message %s\n%s\n' % (message, rootMessage)
print msg
print '---------------------------------------------------------------'
conn.close()
As you can see there is no decoding in this because everything that I have tried has failed.
I am very new to python so if someone could steer me into the right direction I would really appreciate it. A hack would be all right, it not a mission critical script, but a generic solution would be best.
-G
UPDATE:
There is no error, the problem is that the output is not decoded correctly.
Example input:
This is a test message.
Gísli
Output:
This is a test message.
G=EDsli