2

I'm trying to get one sentence from a lot of HTML emails. The sentence is located in the exact same place in every email (including the same line if you view the source code).

So far I have used imaplib to set up the connection to the correct mailbox, search and fetch the body of the email.

response_code_fetch, data_fetch = mail.fetch('1', '(BODY.PEEK[TEXT])')
if response_code_fetch == "OK":
    print("Body Text: " + str(data_fetch[0]))
else:
    print("Unable to find requested messages")

However, I get an incoherent list that has the entire body of the email at index [0] of the returned list. I've tried str(data_fetch[0]) and then using the splitlines method, but it doesn't work.

I've also found the below suggestion online using the email module, but it doesn't seem to work as it prints the else statement.

my_email = email.message_from_string(data_fetch)
body = ""
if my_email.is_multipart():
    for part in my_email.walk():
        ctype = part.get_content_type()
        cdispo = str(part.get('Content-Disposition'))
        print(ctype, cdispo)

# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
    print("Email is not multipart")
    body = my_email.get_payload(decode=True)
    print(body)

I won't include the whole result as it's very long but it basically looks like I get the code for the email, HTML formatting, body text and all:

Body Text: [(b'1 (BODY[TEXT] {78687}', b'--_av-
uaAIyctTRCxY0f6Fw54pvw\r\nContent-Type: text/plain; charset=utf-
8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n

Does anyone know how can I get the one sentence out of the body text?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Jiggs
  • 55
  • 5

1 Answers1

0

I think the b in front of your string makes it a a byte literal. What if you put a .decode('UTF-8') behind your Body Text string?

Roald
  • 2,459
  • 16
  • 43