0

I wrote this code to get content from an email using imaplib from a gmail account. so far i have gotten to a point where you can see the content in this form.

here is the code,

import imaplib
from email.parser import HeaderParser


user = "email"
password =  "pass"

mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login(user, password)
mail.list()
mail.select('inbox')

result, data = mail.search(None, "ALL")

ids = data[0] # data is a list.
id_list = ids.split() # ids is a space separated string
latest_email_id = id_list[-1] # get the latest
result, data = mail.fetch(latest_email_id, '(BODY.PEEK[TEXT])') # fetch the email body for the given ID

header_data = data[0][1] # here's the body, which is raw text of the whole email

parser = HeaderParser()
msg = parser.parsestr(header_data)


print msg

what i end up getting when running this code is this,

From nobody Fri Mar 02 14:58:48 2018

--089e0823549c5eeef8056673549d
Content-Type: text/plain; charset="UTF-8"

this is the body

--089e0823549c5eeef8056673549d
Content-Type: text/html; charset="UTF-8"

<div dir="ltr">this is the body</div>

--089e0823549c5eeef8056673549d--

i tried using beautifulsoup4 and that did not work out, how can i take just the body that is in the <div dir="ltr">this is the body</div> from what is being returned and save it into a variable. Also, is there another way of doing this?

Thanks in advance

  • You are looking at the message's MIME structure. You'll want to familiarize youself with the Python `email` library to handle it. – tripleee Mar 02 '18 at 20:39
  • See perhaps [this](https://stackoverflow.com/questions/49041958/reading-email-attachments-with-smtpserver-in-python-3) and [this](https://stackoverflow.com/questions/48562935/what-are-the-parts-in-a-multipart-email) – tripleee Mar 02 '18 at 20:44
  • 1
    ... Though you'll want to fetch the entire RFC822 message in order to get the proper MIME headers. – tripleee Mar 02 '18 at 20:48

0 Answers0