15

I'm creating a backup routine for my application with Java. However, when the zip file is over 4GB, or has more than 65,000 files (approximately), the zip file is corrupted.

I'm also testing the Apache Commons Compression for compacting to tar.gz, but it has file name limit of 100 characters. I was wanting to test this API compressing to zip, but I wonder what exactly is the problem with the java zip.

So, the real question is: I'm doing something wrong, it is limit of Java Zip implementation, or is the limit for the Zip format itself?

Thanks.

caarlos0
  • 20,020
  • 27
  • 85
  • 160
  • 2
    You really have an eye for the approximately 65,000 files. Didn't that ring a 2-bytes bell? :) – Lazlo Jul 18 '11 at 21:54
  • 4
    Usually, when people see numbers like 255/256, 65535/65536, 2,147,483,647/2,147,483,648 or the like, they know they have to deal with 1 byte, 2 bytes (short) or 4 bytes (integer). If you're around this number, you can get the intuition that you reached a byte limitation. – Lazlo Jul 19 '11 at 15:33

4 Answers4

14

Quoting from Wikipedia:

The original ZIP format had a 4 GiB limit on various things (uncompressed size of a file, compressed size of a file and total size of the archive), as well as a limit of 65535 entries in a ZIP archive.

and about ZIP64:

Java's built-in java.util.zip does not support it as of September 2010, but it has been added to OpenJDK and is planned for inclusion in Java 7.

MRAB
  • 20,356
  • 6
  • 40
  • 33
  • 1
    What is the "it" that's unsupported in the second cut-n-paste quote? (EDIT: apparently ZIP64) – Steve Perkins Jul 18 '11 at 20:15
  • [ZIP64](http://en.wikipedia.org/wiki/ZIP_(file_format)#ZIP64), a set of extensions to get around the limitations mentioned in the first quote. – Jonik Jul 18 '11 at 20:18
5

It's a bug that's reported fixed in Java 7: http://bugs.sun.com/view_bug.do?bug_id=4681995

One of the commenters on that tickets mentions TrueZIP as a workaround.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • hmm.. I tried to use chilkat (http://www.chilkatsoft.com/) and apache commons... someone know it? Is it good? – caarlos0 Jul 18 '11 at 20:25
3

There's a 4GB file size limit on standard Zip files.

See the wikipedia entry on zip files for some more info...... apparently you can get much much larger files if you use ZIP64 format.

p.s. if you find yourself trying to back up more than 4GB of data at a time, perhaps you should be considering a different approach? Maybe something that takes a versioned filesystem snapshot would be more appropriate?

mikera
  • 105,238
  • 25
  • 256
  • 415
  • hmm, thanks. I suspected that already. As for versioning, as sometimes customers need to download the backups, so versioning is not the best solution for me. Still, thanks for the info. – caarlos0 Jul 18 '11 at 20:22
0

Yes, there is a limit. If you go through the entries like

        int count  = 0;
        for (Enumeration<? extends ZipEntry> e = zipFile.entries(); e.hasMoreElements(); ) {
            final ZipEntry ze = e.nextElement();
            count++;
        }

it will count up until 65535 entries and no more.

user1712200
  • 329
  • 5
  • 8