66

In Java does it matter whether I instantiate a ZipOutputStream first, or the BufferedOutputStream first? Example:

FileOutputStream dest = new FileOutputStream(file);
ZipOutputStream zip = new ZipOutputStream(new BufferedOutputStream(dest));

// use zip output stream to write to

Or:

FileOutputStream dest = new FileOutputStream(file);
BufferedOutputStream out = new BufferedOutputStream(new ZipOutputStream(dest));

// use buffered stream to write to

In my non-scientific timings I can't seem to tell much of a difference here. I can't see anything in the Java API that says if one of these ways is necessary or preferred. Any advice? It seems like compressing the output first and then buffering it for writes would be more efficient.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
jjathman
  • 12,536
  • 8
  • 29
  • 33
  • 2
    Theoretically, compressing then buffering is going to be faster. However, `GZipOutputStream` has an internal buffer, so it doesn't write individual bytes out to the underlying stream. Depending on the underlying stream type (eg, file vs socket) and the relative sizes of the buffers, you may or may not see any difference. – parsifal Jan 22 '13 at 17:47

2 Answers2

105

You should always wrap the BufferedOutputStream with the ZipOutputStream, never the other way around. See the below code:

FileOutputStream fos = new FileOutputStream("hello-world.zip");
BufferedOutputStream bos = new BufferedOutputStream(fos);
ZipOutputStream zos = new ZipOutputStream(bos);

try {
    for (int i = 0; i < 10; i++) {
        // not available on BufferedOutputStream
        zos.putNextEntry(new ZipEntry("hello-world." + i + ".txt"));
        zos.write("Hello World!".getBytes());
        // not available on BufferedOutputStream
        zos.closeEntry();
    }
}
finally {
    zos.close();
}

As the comments say the putNextEntry() and closeEntry() methods are not available on the BufferedOutputStream. Without calling those methods ZipOutputStream throws an exception java.util.zip.ZipException: no current ZIP entry.

For the sake of completeness, it is worth noting that the finally clause only calls close() on the ZipOutputStream. This is because by convention all built-in Java output stream wrapper implementations propagate closing.

EDIT

I just tested it the other way around. It turns out that wrapping a ZipOutputStream with BufferedOutputStream and then only calling write() on it (without creating / closing entries) will not throw a ZipException. Instead the resulting ZIP file will be corrupt, without any entries inside it.

Daniel Dinnyes
  • 4,898
  • 3
  • 32
  • 44
  • 3
    In that case, is there any sense for buffering? I am not arguing here, just being curious if anyone checked, so far. – wst Apr 14 '16 at 08:45
  • 3
    As you can see in the first part of [MrSmith42's](http://stackoverflow.com/a/14462420/244935) answer, using an inner BufferedOutputStream could be potentially beneficial, by buffering the already compressed output stream before writing to the disk. You will use a bit more memory (for keeping the zip compressed bytes in the memory buffer before flushing to disk) but is more efficient, as disk I/O is done in larger chucks of bytes (the size of the buffer the BufferedOutputStream was initialized with). – Daniel Dinnyes Apr 27 '16 at 10:42
  • What [buffer size BufferedOutputStream](https://docs.oracle.com/javase/7/docs/api/java/io/BufferedOutputStream.html#BufferedOutputStream%28java.io.OutputStream,%20int%29) inside a `ZipOutputStream` is the most performant for you, you should [figure out](https://stackoverflow.com/questions/236861/how-do-you-determine-the-ideal-buffer-size-when-using-fileinputstream) yourself, – Daniel Dinnyes Apr 27 '16 at 10:53
24

You should:

ZipOutputStream out =  new ZipOutputStream(new BufferedOutputStream(dest));

because you want to buffer the writing to the disc (because this is much more efficient in big data blocks than in a lot of little ones).


This

new BufferedOutputStream(new ZipOutputStream(dest));

would buffer before zip compression. But this all happens in the memory and does not need buffering because a lot of little memory accesses are about the same speed as a few big ones. In memory general the needed time is proportional to the number of bytes read/write.

As mentioned in the comments:

The methods of ZipOutputStream which are not part of BufferedOutputStream would not be available also. E.g. putNextEntry and closeEntry.

MrSmith42
  • 9,961
  • 6
  • 38
  • 49
  • 1
    I am sure my answer is correct. But feel free to try it both ways and compare the performance (or debug them). – MrSmith42 Nov 10 '13 at 16:34
  • 8
    My point was that there is no meaning of comparing any performance between the two. Wrapping the `ZipOutputStream` in a `BufferedOutputStream` is meaningless altogether, as it does not expose the `putNextEntry` and `closeEntry` methods. – Daniel Dinnyes Nov 11 '13 at 10:38
  • 2
    Down-voting as the answer does not mention the fact that the methods of the ZipOutputStream are not available when on the stream if wrapping the wrong way. – Christoffer Soop May 09 '15 at 16:41