0

I'm trying to make an IMAP connector in Python to retrieve mail before processed them. For now I do my test with GMAIL.

Here is the mail I'm trying to process : https://i.stack.imgur.com/Nrb1C.png

In my code here is what I do for now :

import email
import chardet
import imaplib
for mail_id in data:
    res, data = self.conn.fetch(mail_id, '(BODY[HEADER.FIELDS (FROM)] BODY[HEADER.FIELDS (TO)] UID BODY[TEXT])')
    raw_body = data[0][1]
    encoding_body = chardet.detect(raw_body)['encoding']
    body = email.message_from_string(raw_body.decode(encoding_body))
    dest_email = data[1][1]
    sender_email = data[2][1]

    f = open('/home/nathan/PycharmProjects/test.html', 'w+')
    f.write(body.get_payload())
    f.close()

I didn't have any error but here is the result of the body :

https://pastebin.com/fhzqsCRM

Any idea why some weird char are here ? Also, the char with accent aren't correctly decoded..

Thanks in advance

EDIT :

I already try with the following :

res, data = self.conn.fetch(mail_id, '(RFC822)')
msg = email.message_from_bytes(data[0][1])
for part in msg.walk():
    if part.get_content_type() == 'text/html':
        print(part)

I just have the HTML, but same issue with encoding..

FINAL EDIT :

Thanks to @Botje it appears that the e-mail was encoded with 'Quoted-printable'. So here is the solution for that type of encoding :

import quopri
res, data = self.conn.fetch(mail_id, '(RFC822)')
msg = email.message_from_bytes(data[0][1])
for part in msg.walk():
    if part.get_content_type() == 'text/html':
        part = quopri.decodestring(part.get_payload()).decode('UTF-8')
Nathan Cheval
  • 773
  • 2
  • 7
  • 32

0 Answers0