2

(Note: this question has nothing to do with encoding, as should be clear by reading it. Ignore the suggestion above.)

I'm learning Python and figured a nice tool to start out with would be something that would grab some emails over MIME and display a given header. The following is basically my script:

#!/usr/bin/env python3

from imaplib import IMAP4_SSL
from netrc import netrc
from email import message_from_bytes

conn = IMAP4_SSL('imap.gmail.com')
auth = netrc().hosts['imap.gmail.com']

conn.login(auth[0], auth[2])
conn.select()

typ, data = conn.search(None, 'ALL')
i = 0
for num in reversed(data[0].split()):
    i += 1
    typ, data = conn.fetch(num, '(RFC822)')
    email = message_from_bytes(data[0][1])
    print("%i: %s" % (int(num), email.get('subject')))
    if i == 5:
        break
conn.close()
conn.logout()

The frustrating thing is that the header comes back folded; thus showing through the underlying email string instead of the actual value inside of the header.

How can I get the correctly unfolded header value? I'd like to stick with core python3 stuff but I'm open to external deps if I must.

Frew Schmidt
  • 9,364
  • 16
  • 64
  • 86
  • There is a [provided method to decode the header](https://docs.python.org/3.5/library/email.header.html#email.header.decode_header). – Bob Dylan Dec 07 '15 at 19:58
  • Possible duplicate of [Python - email header decoding UTF-8](http://stackoverflow.com/questions/7331351/python-email-header-decoding-utf-8) – Bob Dylan Dec 07 '15 at 19:58
  • No, this is not about UTF-8 – Frew Schmidt Dec 07 '15 at 20:27
  • did you read the top answer that shows using `from email.header import decode_header` ? Which I also linked to above? It will decode from any MIME type specified. – Bob Dylan Dec 07 '15 at 20:28
  • Yes, and I tried it. As I said, this is not about UTF-8 or character encoding. using that function makes exactly no difference. – Frew Schmidt Dec 07 '15 at 20:33
  • Out of curiousity, what exactly is Your output, what do You mean by folded? I'm getting header in plain text. – JustMe Dec 07 '15 at 20:55
  • A folded header is one with a newline in it. Most emails do not have folded subjects but it looks to me like most github issues do. – Frew Schmidt Dec 07 '15 at 20:59
  • With python, some header lines get wrapped between msg=email.message_from_file() and msg.as_string(). With python3 the same message comes back intact. (This is Python 2.7.16 vs. Python 3.7.3). – Ale Aug 27 '20 at 16:43

2 Answers2

1

Use Policy Objects to enable unfolding in the Python email package. In your script, you would have to add:

from email.policy import SMTPUTF8

to import the policy SMTPUTF8, and later use that when calling message_from_bytes:

email = message_from_bytes(data[0][1], policy=SMTPUTF8)

I tried your script with Python 3.9.5, actually all policies except compat32 (which is used when the parameter policy is absent) enabled unfolding.

Günther Bayler
  • 141
  • 1
  • 6
0

TL;DR: strip newlines

I'd love it if there were a simple answer to this, so if you have a better one feel free to add it. In the meantime, this sorta ghetto solution works perfectly:

#!/usr/bin/env python3

from imaplib import IMAP4_SSL
from netrc import netrc
from email import message_from_bytes
import re

conn = IMAP4_SSL('imap.gmail.com')
auth = netrc().hosts['imap.gmail.com']

conn.login(auth[0], auth[2])
conn.select()

typ, data = conn.search(None, 'ALL')
i = 0
for num in reversed(data[0].split()):
    i += 1
    typ, data = conn.fetch(num, '(RFC822)')
    email = message_from_bytes(data[0][1])
    raw_header = email.get('subject')
    header = re.sub('[\r\n]', '', header)

    print("%i: %s" % (int(num), header))
    if i == 5:
        break
conn.close()
conn.logout()
Frew Schmidt
  • 9,364
  • 16
  • 64
  • 86
  • If that's just it it (might be faster - that's just bold untested statement) to avoid regex and use python builtin `header = header.replace("\r\n", "")`...well, ok ran some tests. It's about 10 times faster in my case. – JustMe Dec 07 '15 at 21:19
  • It could be either \r\n or just \n. I'm not super worried about performance here. – Frew Schmidt Dec 07 '15 at 21:53