57

I just read about zip bombs, i.e. zip files that contain very large amount of highly compressible data (00000000000000000...).

When opened they fill the server's disk.

How can I detect a zip file is a zip bomb before unzipping it?

UPDATE Can you tell me how is this done in Python or Java?

flybywire
  • 261,858
  • 191
  • 397
  • 503
  • 1
    The compression ratio can be smth like 1000 to 1 - not only it consumes a lot of disk space but also takes long time to write the output. – sharptooth Sep 22 '09 at 09:40
  • 1
    [Related question about gzip and bzip2](http://stackoverflow.com/questions/13622706/how-to-protect-myself-from-a-gzip-or-bzip2-bomb). – Joachim Breitner Nov 29 '12 at 09:42

7 Answers7

26

Try this in Python:

import zipfile

with zipfile.ZipFile('a_file.zip') as z
    print(f'total files size={sum(e.file_size for e in z.infolist())}')
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Nick Dandoulakis
  • 42,588
  • 16
  • 104
  • 136
  • 7
    At least with gzip I think the uncompressed size might not be in the header (so it might work with zip, but not with .tar.gz) – tonfa Sep 22 '09 at 12:30
  • @tonfa, thanks for mentioning that zipfile doesn't handle gnu zip format. – Nick Dandoulakis Sep 22 '09 at 13:05
  • 4
    IIRC, Zip standard (and let's face it, if you want to cause a DoS, you are necessarily going to follow standards) allows certain sizes to be elided from the central directory and entry headers. – Tom Hawtin - tackline Sep 22 '09 at 13:14
  • 20
    The most famous zip bomb will pass this test because the first level is not very big. You need to check ZIP depth (ZIP inside ZIP) also. – ZZ Coder Sep 22 '09 at 14:59
  • 2
    @ZZ Coder, hmm that's true. Tom Hawtin - tackline's solution is better in case you decompress all levels at once. – Nick Dandoulakis Sep 22 '09 at 15:25
  • 2
    @Kevin, you ask if the unzipping procedure does actually verify the "size" attribute? Good point. If not, then the above code can "fail", of course. – Nick Dandoulakis Nov 30 '11 at 16:48
  • @SAN3 check out http://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipFile.html#entries() – Nick Dandoulakis Jan 31 '13 at 05:19
  • What if you include a zip bomb within a normal zip? – Tom Gullen Sep 07 '15 at 14:13
  • 1
    @ZZCoder When you don't decompress the ZIP file recursively (meaning decompressing inner ZIP files of the initial ZIP) you are not affected by multi-layered zip bombs, are you? I can' really imagine why someone would want to decompress recursively, but I guess there will be some use-case for that... – user2173353 Nov 13 '15 at 12:39
  • Maybe it will be useful to someone, ZipFile can accept not only the path to the archive, but also any file-like object. You can check the archive without even saving – foske Jul 05 '21 at 16:03
25

Zip is, erm, an "interesting" format. A robust solution is to stream the data out, and stop when you have had enough. In Java, use ZipInputStream rather than ZipFile. The latter also requires you to store the data in a temporary file, which is also not the greatest of ideas.

Tom Hawtin - tackline
  • 145,806
  • 30
  • 211
  • 305
  • This is old, I know, but still: How come it matters whether you're reading a file or an input stream? To my understanding, you can read both types using an iterative approach, stopping when you've read a certain amount of bytes or reached a certain number of iterations. – Asger Skov Velling Oct 13 '22 at 11:01
  • 1
    @AsgerSkovVelling It is quite old. `ZipFile` requires you to download the entire thing to read the directory which is at the end. (Zip was design for archiving not retrieval. Files can be streamed out, and then when all the indexes are known, the directory written.) Perhaps the worst problem is that you need all of the archive available all at once. There is an additional check you need to make sure that the entire compressed archive isn't too large. Also, if you read the directory then it may direct you to read the same file data repeatedly. Files can even overlap. – Tom Hawtin - tackline Oct 14 '22 at 12:37
  • 1
    @AsgerSkovVelling Oh, and there's also the issue of Gifar, which followed a series a vulnerabilities in IE and Flash. If you go through the directory, the front of the file may be something else. `ZipInputStream` will check the file starts with magic number for a local header. https://en.wikipedia.org/wiki/Gifar – Tom Hawtin - tackline Oct 14 '22 at 17:33
13

Reading over the description on Wikipedia -

Deny any compressed files that contain compressed files.
     Use ZipFile.entries() to retrieve a list of files, then ZipEntry.getName() to find the file extension.
Deny any compressed files that contain files over a set size, or the size can not be determined at startup.
     While iterating over the files use ZipEntry.getSize() to retrieve the file size.

Michael Lloyd Lee mlk
  • 14,561
  • 3
  • 44
  • 81
  • 1
    `getSize` lies. The the size claimed in the directory, an entirely difference size in the local header and then a different size again when you actually come to decompress. Also, I don't know what files types are compressed (images?) and it's typical for files to be sent over a compressed link (HTTP often has gzip compression). – Tom Hawtin - tackline Oct 14 '22 at 12:41
6

Don't allow the upload process to write enough data to fill up the disk, ie solve the problem, not just one possible cause of the problem.

Pete Kirkham
  • 48,893
  • 5
  • 92
  • 171
5

Check a zip header first :)

Vladislav Rastrusny
  • 29,378
  • 23
  • 95
  • 156
4

If the ZIP decompressor you use can provide the data on original and compressed size you can use that data. Otherwise start unzipping and monitor the output size - if it grows too much cut it loose.

sharptooth
  • 167,383
  • 100
  • 513
  • 979
1

Make sure you are not using your system drive for temp storage. I am not sure if a virusscanner will check it if it encounters it.

Also you can look at the information inside the zip file and retrieve a list of the content. How to do this depends on the utility used to extract the file, so you need to provide more information here

Heiko Hatzfeld
  • 3,197
  • 18
  • 15