1

I want to decode 'quoted-printable' encoded strings in Python, but I seem to be stuck at a point.

I fetch certain mails from my gmail account based on the following code:

import imaplib
import email
import quopri


mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('mail@gmail.com', '*******')
mail.list()

mail.select('"[Gmail]/All Mail"') 



typ, data = mail.search(None, 'SUBJECT', '"{}"'.format('123456'))

data[0].split()

print(data[0].split())

for e_mail in data[0].split():
    typ, data = mail.fetch('{}'.format(e_mail.decode()),'(RFC822)')
    raw_mail = data[0][1]
    email_message = email.message_from_bytes(raw_mail)
    if email_message.is_multipart():
        for part in email_message.walk():
            if part.get_content_type() == 'text/plain':
                if part.get_content_type() == 'text/plain':
                    body = part.get_payload()
                    to = email_message['To']

                    utf = quopri.decodestring(to)

                    text = utf.decode('utf-8')
                    print(text)
.
.
.

If I print 'to' for example, the result is this if the 'to' has characters like é,á,ó...:

=?UTF-8?B?UMOpdGVyIFBldMWRY3o=?=

I can decode the 'body' quoted-printable encoded string successfully using the quopri library as such:

quopri.decodestring(sometext).decode('utf-8') 

But the same logic doesn't work for other parts of the e-mail, such as the to, from, subject.

Anyone knows a hint?

Peter Petocz
  • 186
  • 2
  • 14

3 Answers3

2

The subject string you have is not pure quoted printable encoding (i.e. not standard quopri) — it is a mixture of base64 and quoted printable. You can decode it with the standard library:

from email.header import decode_header

result = decode_header('=?UTF-8?B?UMOpdGVyIFBldMWRY3o=?=')
# ^ the result is a list of tuples of the form [(decoded_bytes, encoding),]

for data, encoding in result:
    print(data.decode(encoding))
    # outputs: Péter Petőcz
ccpizza
  • 28,968
  • 18
  • 162
  • 169
0

You are trying to decode latin characters using utf-8. The output you are getting is base64. It reads:

No printable characters found, try another source charset, or upload your data as a file for binary decoding.

Give this a try. Python: Converting from ISO-8859-1/latin1 to UTF-8

an_owl
  • 69
  • 10
0

This solves it:

from email.header import decode_header
      def mail_header_decoder(header):
            if header != None:
                mail_header_decoded = decode_header(header)
                l=[]  
                header_new=[]
                for header_part in mail_header_decoded: 
                    l.append(header_part[1])

                if all(item == None for item in l):
                    # print(header)
                    return header
                else:
                    for header_part in mail_header_decoded:
                        header_new.append(header_part[0].decode())
                    header_new = ''.join(header_new) # convert list to string
                    # print(header_new)
                    return header_new
Peter Petocz
  • 186
  • 2
  • 14