2

I am trying to use python and email.Parser to parse an email from a file. I use the following command

headers = Parser().parse(open(filename, 'r'))

to parse the file. But when I try to get the body I use e.g.

print(headers.get_payload()[0])

and I get something like

From nobody Mon Oct 12 16:32:25 2015
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Alex,
....

Is there some way to get rid of those first three/four lines? And how to decode content like 'fr=C3=BCher'?

Alex
  • 41,580
  • 88
  • 260
  • 469

2 Answers2

3

To get to the message body, you have to walk() it's different parts, i.e.:

a = email.message_from_file(open(filename, 'r')) #shorthand for Parser().parse
body = ''

if a.is_multipart():
   for part in b.walk():
       ctype = part.get_content_type()
       cdispo = str(part.get('Content-Disposition'))

       # skip any text/plain (txt) attachments
       if ctype == 'text/plain' and 'attachment' not in cdispo:
           body = part.get_payload(decode=True)  # decode
           break
# not multipart - i.e. plain text, no attachments
else:
    body = b.get_payload(decode=True)

The decode=True in get_payload() does the base64/etc decoding, i.e the 'fr=C3=BCher' strings

Todor Minakov
  • 19,097
  • 3
  • 55
  • 60