I'm trying to make an IMAP connector in Python to retrieve mail before processed them. For now I do my test with GMAIL.
Here is the mail I'm trying to process : https://i.stack.imgur.com/Nrb1C.png
In my code here is what I do for now :
import email
import chardet
import imaplib
for mail_id in data:
res, data = self.conn.fetch(mail_id, '(BODY[HEADER.FIELDS (FROM)] BODY[HEADER.FIELDS (TO)] UID BODY[TEXT])')
raw_body = data[0][1]
encoding_body = chardet.detect(raw_body)['encoding']
body = email.message_from_string(raw_body.decode(encoding_body))
dest_email = data[1][1]
sender_email = data[2][1]
f = open('/home/nathan/PycharmProjects/test.html', 'w+')
f.write(body.get_payload())
f.close()
I didn't have any error but here is the result of the body :
Any idea why some weird char are here ? Also, the char with accent aren't correctly decoded..
Thanks in advance
EDIT :
I already try with the following :
res, data = self.conn.fetch(mail_id, '(RFC822)')
msg = email.message_from_bytes(data[0][1])
for part in msg.walk():
if part.get_content_type() == 'text/html':
print(part)
I just have the HTML, but same issue with encoding..
FINAL EDIT :
Thanks to @Botje it appears that the e-mail was encoded with 'Quoted-printable'. So here is the solution for that type of encoding :
import quopri
res, data = self.conn.fetch(mail_id, '(RFC822)')
msg = email.message_from_bytes(data[0][1])
for part in msg.walk():
if part.get_content_type() == 'text/html':
part = quopri.decodestring(part.get_payload()).decode('UTF-8')