6

I want to retrieve body (only text) of emails using python imap and email package.

As per this SO thread, I'm using the following code:

mail = email.message_from_string(email_body)
bodytext = mail.get_payload()[ 0 ].get_payload()

Though it's working fine for some instances, but sometime I get similar to following response

[<email.message.Message instance at 0x0206DCD8>, <email.message.Message instance at 0x0206D508>]
Community
  • 1
  • 1
biztiger
  • 1,447
  • 4
  • 23
  • 40
  • Does this answer your question? [Python : How to parse the Body from a raw email , given that raw email does not have a "Body" tag or anything](https://stackoverflow.com/questions/17874360/python-how-to-parse-the-body-from-a-raw-email-given-that-raw-email-does-not) – Todor Minakov Dec 04 '19 at 13:28

4 Answers4

7

The main problem in my case is that replied or forwarded message shown as message instance in the bodytext.

Solved my problem using the following code:

bodytext=mail.get_payload()[0].get_payload();
if type(bodytext) is list:
    bodytext=','.join(str(v) for v in bodytext)
biztiger
  • 1,447
  • 4
  • 23
  • 40
7

You are assuming that messages have a uniform structure, with one well-defined "main part". That is not the case; there can be messages with a single part which is not a text part (just an "attachment" of a binary file, and nothing else) or it can be a multipart with multiple textual parts (or, again, none at all) and even if there is only one, it need not be the first part. Furthermore, there are nested multiparts (one or more parts is another MIME message, recursively).

In so many words, you must inspect the MIME structure, then decide which part(s) are relevant for your application. If you only receive messages from a fairly static, small set of clients, you may be able to cut some corners (at least until the next upgrade of Microsoft Plague hits) but in general, there simply isn't a hierarchy of any kind, just a collection of (not necessarily always directly related) equally important parts.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Maybe see also [What are the “parts” in a multipart email?](/q/48562935) which has a longer answer of mine along the same lines. – tripleee Nov 07 '18 at 09:42
  • 1
    Python 3.6+ has a revamped `email` library with a method [`get_body`](https://docs.python.org/3/library/email.message.html#email.message.EmailMessage.get_body) which attempts to guess the "main body part" for you. – tripleee Mar 20 '19 at 17:14
1

My external lib: https://github.com/ikvk/imap_tools

from imap_tools import MailBox 

# get list of email bodies from INBOX folder
with MailBox('imap.mail.com').login('test@mail.com', 'pwd', 'INBOX') as mailbox:
    bodies = [msg.text or msg.html for msg in mailbox.fetch()]
Vladimir
  • 6,162
  • 2
  • 32
  • 36
0

Maybe this post (of mine) can be of help. I receive a Newsletter with prices of different kind of oil in the US. I fetch email in gmail with a given pattern for the title, then I extract the prices in the mail body using regex. So i have to access the mail body for the last n emails which title observe given pattern.

I am using email.message_from_string() also: msg = email.message_from_string(response_part[1])

so maybe it gives you concrete example of how to use methods in this python lib.

Community
  • 1
  • 1
kiriloff
  • 25,609
  • 37
  • 148
  • 229