0

I am using the below code to read an unread email.

In mail.fetch method, getting typ,data as a returned parameters and we are accessing the raw email with raw_email = data[0][1]. Could anyone explain why we are hardcoding the index as [0][1] for getting the message? Is that any proper way to get message without doing any hardcode?

Python code below:

import imaplib

mail = imaplib.IMAP4_SSL('imap.gmail.com')
try:
    mail.login(email_user, email_pass)
    status, messages = mail.select("INBOX")

    (retcode, emailnums) = mail.search(None,'(UNSEEN)')
    if retcode == 'OK':

        for emailnum in emailnums[0].split():

            typ,data = mail.fetch(emailnum,'(RFC822)')
            raw_email = data[0][1]                       
            #converts byte literal to string removing b''
            raw_email_string = raw_email.decode('utf-8')
            email_message = email.message_from_string(raw_email_string)
tripleee
  • 175,061
  • 34
  • 275
  • 318

1 Answers1

0

The response from the IMAP server is a nested tuple containing a status message, envelope information, and the actual contents of the email you requested. There is no simple way to avoid saying which parts of the response tuple you need, though perhaps you will want to look for a higher-level wrapper around Python's low-level imaplib if you don't want to look at these nitty-gritty details of how things work on the protocol level (or rather, how the Python library represents what's moving over the wire).

As an aside, decoding the bytes as UTF-8 is absolutely the wrong thing to do. If you are moderately lucky, you are not doing any direct harm (that is, the message is trivially all ASCII, and any 8-bit data is hidden behind a content transfer encoding), but it's still wrong. You should instead call email_message = email.message_from_bytes(raw_email)

Just to be explicit, if you are only slightly less lucky, the message contains 8-bit text which isn't UTF-8, and you will get a traceback with a UnicodeDecodeError. You have not yet examined the message so there is absolutely no way to correctly guess whether it contains character data at all, and if so, which encoding(s) it uses. Perhaps see also What is character encoding and why should I bother with it

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Hi @tripleee ,Thank you for your reply, It works fine!!. I have followed and go through the instructions in the character encoding. I have included the modified code in the above question. Let me know if any. Actually I am downloading the attachment from email content. – suresh vignesh Dec 28 '20 at 05:13
  • I rolled back your edit; your question should remain strictly a question. If you want to post an answer of your own, feel free to do that instead. The code had some obvious indentation errors anyway, and some iffy handling of the Content-Disposition header. As an aside, there is no need to remove an existing file before overwriting it, though you might want to check for permission errors from `open` (it could fail if the file exists and you are not allowed to overwrite it). – tripleee Dec 28 '20 at 06:27
  • Thanks Trippleee, I am trying to find an attachment in the mail . Hence used the logic content-disposition Header. Is it any better way to do it. I have googled and got the code. – suresh vignesh Dec 28 '20 at 08:13
  • The presence or absence of Content-Disposition is not guaranteed; each type has a default disposition. Perhaps see also https://stackoverflow.com/questions/48562935/what-are-the-parts-in-a-multipart-email/48563281#48563281 – tripleee Dec 28 '20 at 08:27
  • Also https://stackoverflow.com/a/64008532/874188 has notes specific to Content-Disposition (so the code above them is not really correct). – tripleee Dec 28 '20 at 08:43
  • Thanks a lot for the suggestions , will look into it :) – suresh vignesh Dec 28 '20 at 12:14