48

The only way I came up for deleting a file from a zipfile was to create a temporary zipfile without the file to be deleted and then rename it to the original filename.

In python 2.4 the ZipInfo class had an attribute file_offset, so it was possible to create a second zip file and copy the data to other file without decompress/recompressing.

This file_offset is missing in python 2.6, so is there another option than creating another zipfile by uncompressing every file and then recompressing it again?

Is there maybe a direct way of deleting a file in the zipfile, I searched and didn't find anything.

Anthon
  • 69,918
  • 32
  • 186
  • 246
RSabet
  • 6,130
  • 3
  • 27
  • 26
  • I found this thread on the Python bug tracker discussing the difficulties of removing files from a zip file: https://bugs.python.org/issue6818 – Elias Zamaria Dec 09 '16 at 21:26

4 Answers4

55

The following snippet worked for me (deletes all *.exe files from a Zip archive):

zin = zipfile.ZipFile ('archive.zip', 'r')
zout = zipfile.ZipFile ('archve_new.zip', 'w')
for item in zin.infolist():
    buffer = zin.read(item.filename)
    if (item.filename[-4:] != '.exe'):
        zout.writestr(item, buffer)
zout.close()
zin.close()

If you read everything into memory, you can eliminate the need for a second file. However, this snippet recompresses everything.

After closer inspection the ZipInfo.header_offset is the offset from the file start. The name is misleading, but the main Zip header is actually stored at the end of the file. My hex editor confirms this.

So the problem you'll run into is the following: You need to delete the directory entry in the main header as well or it will point to a file that doesn't exist anymore. Leaving the main header intact might work if you keep the local header of the file you're deleting as well, but I'm not sure about that. How did you do it with the old module?

Without modifying the main header I get an error "missing X bytes in zipfile" when I open it. This might help you to find out how to modify the main header.

mdm
  • 5,528
  • 5
  • 29
  • 28
  • 2
    thanks, but if i am not wrong - when you take a look at zipfile.writestr you will see that this is just a recompress. It would be much faster to just copy the already compressed files without uncomressing and then compressing them again. – RSabet Feb 05 '09 at 19:07
  • @RSabt I agree with mdm that the unzip-and-rezip is the only viable option so far. By the way, wanna point out that mdm 's code helps, but better use os.path.splitext() when you gonna do something more seriously. – RayLuo Mar 20 '13 at 08:18
  • 1
    also you could avoid extracting the executable files. Check name first, and if not an executable, then read input. Would save some useless extraction time. – Jean-François Fabre Sep 22 '16 at 11:32
10

Not very elegant but this is how I did it:

import subprocess
import zipfile

z = zipfile.ZipFile(zip_filename)

files_to_del = filter( lambda f: f.endswith('exe'), z.namelist()]

cmd=['zip', '-d', zip_filename] + files_to_del
subprocess.check_call(cmd)

# reload the modified archive
z = zipfile.ZipFile(zip_filename)
Kurt
  • 2,339
  • 2
  • 30
  • 31
  • 1
    This is what I ended up doing. Ugly, but `ZipFile` just doesn't seem to have a way of deleting or updating/replacing files. – ArtOfWarfare Jan 26 '18 at 18:04
  • 1
    This solution is platform specific and/or requires `zip` software to be installed on OS. Moreover, the overhead of a new subprocess is introduced. – Buzz Aug 16 '22 at 08:47
7

Based on Elias Zamaria comment to the question.

Having read through Python-Issue #51067, I want to give update regarding it.

For today, solution already exists, though it is not approved by Python due to missing Contributor Agreement from the author.

Nevertheless, you can take the code from https://github.com/python/cpython/blob/659eb048cc9cac73c46349eb29845bc5cd630f09/Lib/zipfile.py and create a separate file from it. After that just reference it from your project instead of built-in python library: import myproject.zipfile as zipfile.

Usage:

with zipfile.ZipFile(f"archive.zip", "a") as z:
    z.remove(f"firstfile.txt")

I believe it will be included in future python versions. For me it works like a charm for given use case.

buhtz
  • 10,774
  • 18
  • 76
  • 149
Kyrylo Kravets
  • 343
  • 5
  • 12
  • Seems to be broken for .jar files, sometimes it deletes everything instead of the file you wanted – Maksiks Aug 17 '23 at 18:59
6

The routine delete_from_zip_file from ruamel.std.zipfile¹ allows you to delete a file based on its full path within the ZIP, or based on (re) patterns. E.g. you can delete all of the .exe files from test.zip using

from ruamel.std.zipfile import delete_from_zip_file

delete_from_zip_file('test.zip', pattern='.*.exe')  

(please note the dot before the *).

This works similar to mdm's solution (including the need for recompression), but recreates the ZIP file in memory (using the class InMemZipFile()), overwriting the old file after it is fully read.


¹ Disclaimer: I am the author of that package.

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • The delete_from_zip_file routine is very useful for me, but i'm getting this error while trying to remove many files from big archive (~3Gb in size) with bunch of folders: "LargeZipFile: Zipfile size would require ZIP64 extensions". I guess there are should be modifications in ruamel.std.zipfile, in the __init__.py file (like allowZip64 = True for zipfile.ZipFile(..)), right? – lugger1 Mar 30 '18 at 18:58
  • I have never worked with `allowZip64`, no idea what it is about. – Anthon Mar 30 '18 at 20:13
  • 1
    Easiest solution for small implications – Maksiks Aug 17 '23 at 17:54