17

The java.nio package has a beautiful way of handling zip files by treating them as file systems. This enables us to treat zip file contents like usual files. Thus, zipping a whole folder can be achieved by simply using Files.copy to copy all the files into the zip file. Since subfolders are to be copied as well, we need a visitor:

 private static class CopyFileVisitor extends SimpleFileVisitor<Path> {
    private final Path targetPath;
    private Path sourcePath = null;
    public CopyFileVisitor(Path targetPath) {
        this.targetPath = targetPath;
    }

    @Override
    public FileVisitResult preVisitDirectory(final Path dir,
    final BasicFileAttributes attrs) throws IOException {
        if (sourcePath == null) {
            sourcePath = dir;
        } else {
        Files.createDirectories(targetPath.resolve(sourcePath
                    .relativize(dir).toString()));
        }
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(final Path file,
    final BasicFileAttributes attrs) throws IOException {
    Files.copy(file,
        targetPath.resolve(sourcePath.relativize(file).toString()), StandardCopyOption.REPLACE_EXISTING);
    return FileVisitResult.CONTINUE;
    }
}

This is a simple "copy directory recursively" visitor. It is used to copy a directory recursively. However, with the ZipFileSystem, we can also use it to copy a directory into a zip file, like this:

public static void zipFolder(Path zipFile, Path sourceDir) throws ZipException, IOException
{
    // Initialize the Zip Filesystem and get its root
    Map<String, String> env = new HashMap<>();
    env.put("create", "true");
    URI uri = URI.create("jar:" + zipFile.toUri());       
    FileSystem fileSystem = FileSystems.newFileSystem(uri, env);
    Iterable<Path> roots = fileSystem.getRootDirectories();
    Path root = roots.iterator().next();

    // Simply copy the directory into the root of the zip file system
    Files.walkFileTree(sourceDir, new CopyFileVisitor(root));
}

This is what I call an elegant way of zipping a whole folder. However, when using this method on a huge folder (around 3 GB) I receive an OutOfMemoryError (heap space). When using a usual zip handling library, this error is not raised. Thus, it seems that the way the ZipFileSystem handles the copy is very inefficient: Too much of the files to be written is kept in memory so the OutOfMemoryError occurs.

Why is this the case? Is using ZipFileSystem generally considered inefficient (in terms of memory consumption) or am I doing something wrong here?

gexicide
  • 38,535
  • 21
  • 92
  • 152

2 Answers2

33

I looked at ZipFileSystem.java and I believe I found the source of the memory consumption. By default, the implementation is using ByteArrayOutputStream as the buffer to compress the files, which means that it's limited by the amount of memory assigned to the JVM.

There's an (undocumented) environment variable we can use to make the implementation use temporary files ("useTempFile"). It works like this:

Map<String, Object> env = new HashMap<>();
env.put("create", "true");
env.put("useTempFile", Boolean.TRUE);

More details here: http://www.docjar.com/html/api/com/sun/nio/zipfs/ZipFileSystem.java.html, interesting lines are 96, 1358 and 1362.

Diego Giagio
  • 1,027
  • 7
  • 7
  • 4
    Thank you so much for your investigation on this. Observing the temp directory when `useTempFile=TRUE` while compressing the files in parallel (using http://goo.gl/woa0Ab) it seems that each file is being zipped independently in parallel into a separate compressed temporary file, and all those are then concatenated into one file. That file is then atomically renamed to the archive name. What a shame this is not documented, and what more shame there is still no streaming parallel zip in the java standard library. – Tomáš Dvořák Apr 03 '15 at 10:03
  • 1
    Thanks for the answer :) , But are the temp files deleted afterwards ??? – VitalyT Aug 31 '18 at 17:08
  • @VitalyT, yes they are deleted, search for `tmppaths` in the [source](https://github.com/openjdk/jdk/blob/master/src/jdk.zipfs/share/classes/jdk/nio/zipfs/ZipFileSystem.java) – Marcono1234 Apr 28 '19 at 14:46
  • 2
    The honestly worst is `useTempFile` has to be a `boolean` with `true`, whereby `create` has to be a `String` with the value `true`. Unbelievable that this made it even into the standard. – Philipp Jul 16 '20 at 01:39
-3

You must prepare the jvm to allow those amounts of memory with -Xms {memory} -Xmx {memory}.

I recommend you check the directory calculating disk space and put a limit, under 1Gb use memory file system, over 1GB use a disk file system.

Another thing, check the concurrency of the method, you'll don't like more than 1 thread zipping 3Gb of files

  • 2
    Sorry, but this answer does not help at all. 1) I know how to increase heap size, this is not the question. 2) What is "memory file system" vs "disk file system"? 3) The method is not concurrent as you should see from the code – gexicide May 25 '14 at 20:14
  • @gexicide Please check my response and if it solves your problem (as it did for others) please mark it as the correct answer. Thanks. – Diego Giagio Apr 05 '15 at 13:49