I'm trying to get one sentence from a lot of HTML emails. The sentence is located in the exact same place in every email (including the same line if you view the source code).
So far I have used imaplib
to set up the connection to the correct mailbox, search and fetch the body of the email.
response_code_fetch, data_fetch = mail.fetch('1', '(BODY.PEEK[TEXT])')
if response_code_fetch == "OK":
print("Body Text: " + str(data_fetch[0]))
else:
print("Unable to find requested messages")
However, I get an incoherent list that has the entire body of the email at index [0]
of the returned list. I've tried str(data_fetch[0])
and then using the splitlines
method, but it doesn't work.
I've also found the below suggestion online using the email
module, but it doesn't seem to work as it prints the else statement.
my_email = email.message_from_string(data_fetch)
body = ""
if my_email.is_multipart():
for part in my_email.walk():
ctype = part.get_content_type()
cdispo = str(part.get('Content-Disposition'))
print(ctype, cdispo)
# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
print("Email is not multipart")
body = my_email.get_payload(decode=True)
print(body)
I won't include the whole result as it's very long but it basically looks like I get the code for the email, HTML formatting, body text and all:
Body Text: [(b'1 (BODY[TEXT] {78687}', b'--_av-
uaAIyctTRCxY0f6Fw54pvw\r\nContent-Type: text/plain; charset=utf-
8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n
Does anyone know how can I get the one sentence out of the body text?