Some background:
imaplib with Python 3.7.4 occasionally returns a photo attachment (jpg) that fails to be decoded from the server after being downloaded. I've confirmed that the photos are encoded when sent with byte64 encoding over multiple emails. Most Photos work; however, certain ones don't for whatever reason. At this time, I don't know which email client is being used to send this particular email that causes the crash or the source of the photo (phone, camera, pc, etc). I've tested every supported file type from python-pillow without any issues though. It's just this one photo/email. And lastly, if I remove the attachment there are no issues, so it's something to do with the photo. All python packages are the current versions.
The commented lines in the code below show things I've tried the following encodings:
- utf-8 (which fails to decode it at all)
Traceback (most recent call last): File "(file path)", line 514, in DoEmail
raw_email_string = raw_email.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 10922: invalid start byte
- cp1252 (Which returns a NoneType object when trying to save the file.)
Traceback (most recent call last):
part.get_payload(decode=True))
TypeError: a bytes-like object is required, not 'NoneType'
I've looked at the documentation for email.parser Source and email.parser Docs and imaplib Docs. Also a good example by MattH and attachment example by John Paul Hayes.
My Question:
Why do certain photos, even though they seem to be encoded correctly, cause it to crash? And how do I fix it? Is there a better method to get and save the attachments?
Relevant Code:
# Site is the email server address
# Port is the email server port, usually 993.
mail = imaplib.IMAP4_SSL(host=Site, port=Port) # imaplib module implements connection based on IMAPv4 protocol
mail.login(Email, password)
mail.select('inbox', readonly=False) # Connected to inbox.
# SearchPhrase is the Phrase used when finding unique emails.
result, data = mail.uid('SEARCH', None, f'Subject "{SearchPhrase}"') # search and return uids instead
if result == 'OK':
EmailIdList = data[0].split() # EmailIdList is a space separated byte string of the ids
count = len(EmailIdList)
for x in range(count):
if GUI: GUI.resultStatus = resx.currentProgress(x+1, count)
latest_email_uid = EmailIdList[x] # unique ids wrt label selected
EmailID = latest_email_uid.decode('utf-8')
result, email_data = mail.uid('fetch', latest_email_uid, '(RFC822)')
if result == 'OK':
raw_email = email_data[0][1]
# try:
# raw_email_string = raw_email.decode('utf-8')
# except:
# raw_email_string = raw_email.decode('cp1252')
# email_message = email.message_from_string(raw_email)
email_message = email.message_from_bytes(raw_email)
print(email_message)
dt = parse(email_message['Date']) #dateutil.parser.parse()
day = str(dt.strftime("%B %d, %Y")) #date())
msg.get_content_charset(), 'ignore').encode('utf8', 'replace')
# this will loop through all the available multiparts in email
for part in email_message.walk():
charset = part.get_content_charset()
if part.get_content_maintype() != 'multipart' and part.get('Content-Disposition') is not None:
fileName = part.get_filename().replace('\n','').replace('\r','')
if fileName != '' and fileName is not None:
print(fileName)
with open(fileName, 'wb') as f:
######## ---- HERE ---- ##########
f.write(part.get_payload(decode=True))
elif part.get_content_type() == "text/plain": # get only text/plain
body = str(part.get_payload(decode=True), str(charset), "ignore").replace('\r','')
print(body)
elif part.get_content_type() == "text/html": # get only html
html = str(part.get_payload(decode=True), str(charset), "ignore").replace('\n', '').replace('\r', ' ')
print(html)
else:
continue
Edit: I believe these are the MIME Headers for the image in question.
------=_NextPart_000_14A6_01D55B4C.3FE8C840
Content-Type: image/jpeg;
name="8~a~0ff68d6a-12aa-49bf-9908-0b28ecd7ec83~634676194557918023.jpg"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="8~a~0ff68d6a-12aa-49bf-9908-0b28ecd7ec83~634676194557918023.jpg"
Edit: The location of the crash (when it decodes the byte64 data to save the file) is denoted by: ######## ---- HERE ---- ##########