15

How would I properly zip bytes to a ByteArrayOutputStream and then read that using a ByteArrayInputStream? I have the following method:

private byte[] getZippedBytes(final String fileName, final byte[] input) throws Exception {
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    ZipOutputStream zipOut = new ZipOutputStream(bos);
    ZipEntry entry = new ZipEntry(fileName);
    entry.setSize(input.length);
    zipOut.putNextEntry(entry);
    zipOut.write(input, 0, input.length);
    zipOut.closeEntry();
    zipOut.close();

    //Turn right around and unzip what we just zipped
    ZipInputStream zipIn = new ZipInputStream(new ByteArrayInputStream(bos.toByteArray()));

    while((entry = zipIn.getNextEntry()) != null) {
        assert entry.getSize() >= 0;
    }

    return bos.toByteArray();
}

When I execute this code, the assertion at the bottom fails because entry.size is -1. I don't understand why the extracted entity doesn't match the entity that was zipped.

Matthieu
  • 2,736
  • 4
  • 57
  • 87
Benny
  • 1,508
  • 3
  • 18
  • 34
  • Why? You already have the bytes. Why would you want zip and unzip them just to get back what you already have? – user207421 Dec 20 '16 at 20:26
  • 3
    This is just a sample as a proof of concept. In my actual scenario I'm creating a mock multipart file with the zipped file's bytes so I can test that another class is correctly unzipping the content. – Benny Dec 20 '16 at 23:07
  • What is the size of bos.toByteArray()? – Gregory.K Dec 30 '16 at 20:35

2 Answers2

15

Why is the size -1?

Calling getNextEntry in a ZipInputStream just position the read cursor at start of the entry to read.

The size (along with other metadata) is stored at the end of the actual data, therefore is not readily available when the cursor is positioned at the start.

These information becomes available only after you read the whole entry data or just go to the next entry.

For example, going to the next entry:

// position at the start of the first entry
entry = zipIn.getNextEntry();
ZipEntry firstEntry = entry;    
// size is not yet available
System.out.println("before " + firstEntry.getSize()); // prints -1

// position at the start of the second entry
entry = zipIn.getNextEntry();
// size is now available
System.out.println("after " + firstEntry.getSize()); // prints the size

or reading the whole entry data:

// position at the start of the first entry
entry = zipIn.getNextEntry();
// size is not yet available
System.out.println("before " + entry.getSize()); // prints -1

// read the whole entry data
while(zipIn.read() != -1);

// size is now available
System.out.println("after " + entry.getSize()); // prints the size

Your misunderstanding is quite common and there are a number of bug reports regarding this problem (which are closed as "Not an Issue"), like JDK-4079029, JDK-4113731, JDK-6491622.

As also mentioned in the bug reports, you could use ZipFile instead of ZipInputStream which would allow to reach the size information prior to access the entry data; but to create a ZipFile you need a File (see the constructors) instead of a byte array.

For example:

File file = new File( "test.zip" );
ZipFile zipFile = new ZipFile(file);

Enumeration enumeration = zipFile.entries();
while (enumeration.hasMoreElements()) {
    ZipEntry zipEntry = (ZipEntry) enumeration.nextElement();
    System.out.println(zipEntry.getSize()); // prints the size
}

How to get the data from the input stream?

If you want to check if the unzipped data is equal to the original input data, you could read from the input stream like so:

byte[] output = new byte[input.length];
entry = zipIn.getNextEntry();
zipIn.read(output);

System.out.println("Are they equal? " + Arrays.equals(input, output));

// and if we want the size
zipIn.getNextEntry(); // or zipIn.read();
System.out.println("and the size is " + entry.getSize());

Now output should have the same content as input.

Loris Securo
  • 7,538
  • 2
  • 17
  • 28
  • Apparently using `ZipInputStream#closeEntry()` has the same effect as `ZipInputStream#getNextEntry()` as far as `ZipEntry#getSize()` is concerned. In any case, both the above approaches will not allow the preceding entry's data to be read once they are invoked. – Ravindra HV Dec 31 '16 at 18:30
  • @RavindraHV if you think about it, it's quite logical: according to the Javadoc of `closeEntry()`: "Closes the current ZIP entry and positions the stream for reading the next entry." This actually means for me (based on my limited knowledge of the ZIP layout) that the entry must be read into a blackhole to be able to "close" it. In this case probably they leveraged some common facility in `getNextEntry()` and `closeEntry()`, and that common facility in turn sets the `size` of the previous entry. – D. Kovács Jan 05 '17 at 10:05
0

How to zip byte[] and unzip it back?

I routinely use the following methods to deflate/inflate (zip/unzip) small byte[] (i.e. when it fits in memory). It is based on the example given in the Deflater javadoc and uses Deflater class to compress data and Inflater class to uncompress it back:

public static byte[] compress(byte[] source, int level) {
    Deflater compresser = new Deflater(level);
    compresser.setInput(source);
    compresser.finish();
    byte[] buf = new byte[1024];
    ByteArrayOutputStream bos = new ByteArrayOutputStream(1024);
    int n;
    while ((n = compresser.deflate(buf)) > 0)
        bos.write(buf, 0, n);
    compresser.end();
    return bos.toByteArray(); // You could as well return "bos" directly
}

public static byte[] uncompress(byte[] source) {
    Inflater decompresser = new Inflater();
    decompresser.setInput(source);
    byte[] buf = new byte[1024];
    ByteArrayOutputStream bos = new ByteArrayOutputStream(1024);
    try {
        int n;
        while ((n = decompresser.inflate(buf)) > 0)
            bos.write(buf, 0, n);
        return bos.toByteArray();
    } catch (DataFormatException e) {
        return null;
    } finally {
        decompresser.end();
    }
}

There is no need for a ByteArrayInputStream, but you could use an InflaterInputStream wrapping it, if you really want to (but using the Inflater directly is easier).

Matthieu
  • 2,736
  • 4
  • 57
  • 87
  • For those who want to downvote again without commenting on how to improve the answer (which is not illegal), the question is "How to zip bytes to `ByteArrayOutputStream` and back", not "How to use `ZipFile`" to achieve compression. – Matthieu Jan 02 '17 at 11:04
  • The title should probably be edited (I'll find something better later), but the issue at hand is reading the `ZipEntry` details as you're unzipping. Your answer doesn't address that. – Sotirios Delimanolis Jan 04 '17 at 00:21
  • @SotiriosDelimanolis thank you for the feedback. To me it sounded like the title is ok but `ZipEntry` was the wrong tool to achieve it, hence my answer. But thanks again to have let me a chance to explain myself :) – Matthieu Jan 04 '17 at 08:28
  • Based on [this SO](http://stackoverflow.com/questions/40672443/java-process-memory-usage-keeps-increasing-infinitely) Update #4 the `Deflater` (and `Inflater`) has its own problems. – D. Kovács Jan 05 '17 at 10:08
  • 1
    @D.Kovács thanks for the link. The workaround given in the OpenJDK bug is to call `Deflater/Inflater.end()`, which is done is the above code. – Matthieu Jan 06 '17 at 10:29