I regularly receive emails with attachments that I must extract and save to disk. I do essentially the following (in Python 2.7):
message = email.message_from_file(sys.stdin)
for part in message.walk():
path = email.header.decode_header(part.get_filename())[0][0]
content = part.get_payload(decode=True)
with open(path, 'w') as f:
f.write(content)
This approach has worked for all types of attachments and all flavors of Content-Transfer-Encoding that I've received so far except when the attachment is a ZIP file and the Content-Transfer-Encoding
is 'quoted-printable'. In those cases the ZIP file that gets written has one fewer byte (around 60-80% of the way through the file) than the original, and unzip
reports errors like:
% unzip -l foo.zip
Archive: foo.zip
error [foo.zip]: missing 1 bytes in zipfile
(attempting to process anyway)
Length Date Time Name
--------- ---------- ----- ----
440228 01-00-1980 00:00 foo - bar.csv
--------- -------
440228 1 file
and
% unzip foo.zip
Archive: foo.zip
error [foo.zip]: missing 1 bytes in zipfile
(attempting to process anyway)
error [foo.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
(attempting to re-compensate)
inflating: foo - bar.csv bad CRC 4c86de66 (should be a53f73b1)
The result of the unzip then differs in size by about .01% from the original CSV, and the final 20-40% or so of the file is garbled.
Now, the code handles ZIP files attached as 'base64' just fine, and it handles other content (Excel files, csv files) attached as 'quoted-printable' just fine. I know that the ZIP attachment content is uncorrupt enough that my regular email reader can save it to disk just fine and extract the original content flawlessly. (Is it possible that real email readers are performing some error correction when saving the attachment that my Python is not doing?)
Is there a known issue with Python unable to read ZIP files being sent as quoted-printable? Are there other functions from Python's email
package I can try to correctly decipher this content?