0

I have an .mbox file that represents many messages at location mbox_fname. In Python 3, I have already loaded each of the messages, which are objects of the class email.message.Message.

I'd like to get access to the body content of the message.

For instance, something like:

import mailbox
the_mailbox = mailbox.mbox(mbox_fname)

for message in the_mailbox:
    subject = message["subject"] 
    content = <???>

How do I access the body of the message?

canary_in_the_data_mine
  • 2,193
  • 2
  • 24
  • 28

1 Answers1

0

I made some progress modifying this answer. This is the best I have so far:

import email
def get_body(message: email.message.Message, encoding: str = "utf-8") -> str:
    body_in_bytes = ""
    if message.is_multipart():
        for part in message.walk():
            ctype = part.get_content_type()
            cdispo = str(part.get("Content-Disposition"))

            # skip any text/plain (txt) attachments
            if ctype == "text/plain" and "attachment" not in cdispo:
                body_in_bytes = part.get_payload(decode=True)  # decode
                break
    # not multipart - i.e. plain text, no attachments, keeping fingers crossed
    else:
        body_in_bytes = message.get_payload(decode=True)

    body = body_in_bytes.decode(encoding)

    return body

So modifying the code in the original question, this gets called like the following:

for message in the_mailbox:
    content = get_body(message)
canary_in_the_data_mine
  • 2,193
  • 2
  • 24
  • 28