0

I'm trying to get only the text/plain section of an email i received with imaplib in Python:

#!/usr/bin/env python

import imaplib
import sys
from pprint import pprint

from email.parser import HeaderParser
from email.header import decode_header

reload(sys)
sys.setdefaultencoding("utf-8")

conn = imaplib.IMAP4_SSL('host')
conn.login('username', 'password')
#conn.select('Inbox', readonly=True)
conn.select('Inbox')
a, b = conn.search(None, '(UNSEEN)')

if b[0]:
        c = b[0]
        d = c.split()
        e = d[-1]

        #(BODY[HEADER.FIELDS (SUBJECT FROM)])
        data = conn.fetch(e,'(BODY[1] BODY[HEADER.FIELDS (SUBJECT FROM)])')
        #pprint(data)

        body_data = data[1][0][1]
        header_data = data[1][1][1]
        #print(body_data)


#      parser = HeaderParser()
#      msg = parser.parsestr(header_data)
#
#      print (decode_header(msg['From'])[0][0].decode('utf-8') + ": " + decode_header(msg['Subject'])[0][0].decode('utf-8'))

In case of a multipart message though, what i end up with "body_data" as the body of the message with all the sections.

I tried with the example here:

raw_message = data[0][1]
msg = email.message_from_string(raw_message)
for part in msg.walk():
    # each part is a either non-multipart, or another multipart message
    # that contains further parts... Message is organized like a tree
    if part.get_content_type() == 'text/plain':
        print part.get_payload() # prints the raw text

However, for some reason, part.get_payload() returns only a single character from the text.

I'm forced to use Python 2.7.13.

tripleee
  • 175,061
  • 34
  • 275
  • 318
rebutia
  • 15
  • 3
  • What does the problematic message actually look like? – tripleee Nov 10 '19 at 13:44
  • It's a multipart message with 3 sections, last of them is a file attachment - 1) text/plain; charset="utf-8", 2) text/html; charset="utf-8" and 3) application/octet-stream. I'm interested only in the text/plain section. – rebutia Nov 10 '19 at 13:54
  • You should add a minimal example message which exhibits this behavior to the question itself. If you can provide a [mre] you can probably remove the IMAP parts entirely. – tripleee Nov 10 '19 at 16:16

1 Answers1

1

I was able to get what i want with a change of BODY[1] to BODY[1.1] in the fetch command:

#!/usr/bin/env python

import imaplib
import sys

from email.parser import HeaderParser
from email.header import decode_header

reload(sys)
sys.setdefaultencoding("utf-8")

conn = imaplib.IMAP4_SSL('host')
conn.login('username', 'password')
#conn.select('Inbox', readonly=True)
conn.select('Inbox')
a, b = conn.search(None, '(UNSEEN)')

if b[0]:
        c = b[0]
        d = c.split()
        e = d[-1]

        data = conn.fetch(e,'(BODY[1.1] BODY[HEADER.FIELDS (SUBJECT FROM)])')

        body_data = data[1][0][1]
        header_data = data[1][1][1]
        print(body_data) #Now "body_data" always contains the body, i.e. only the "text/plain" section

...
rebutia
  • 15
  • 3