16

I can find plenty of functions that let you decompress a GZip file, but how do I decompress a GZip string?

I'm trying to parse a HTTP response where the response body is compressed with GZip. However, the entire response is simply stored in a string so part of the string contains binary chars.

I'm attempting to use:

byte responseBodyBytes[] = responseBody.getBytes();
ByteArrayInputStream bais = new ByteArrayInputStream(responseBodyBytes); 
GZIPInputStream gzis = new GZIPInputStream(bais);

But that just throws an exception: java.io.IOException: Not in GZIP format

Matt
  • 11,157
  • 26
  • 81
  • 110
  • Does this answer your question? [GZIPInputStream to String](https://stackoverflow.com/questions/3627401/gzipinputstream-to-string) – Yash Oct 09 '20 at 05:28

3 Answers3

15

There's no such thing as a GZip string. GZip is binary, strings are text.

If you want to compress a string, you need to convert it into binary first - e.g. with OutputStreamWriter chained to a compressing OutputStream (e.g. a GZIPOutputStream)

Likewise to read the data, you can use an InputStreamReader chained to a decompressing InputStream (e.g. a GZIPInputStream).

One way of easily reading from a Reader is to use CharStreams.toString(Readable) from Guava, or a similar library.

Jonathan
  • 20,053
  • 6
  • 63
  • 70
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 1
    I'm trying to parse a HTTP response where the response body is compressed with GZip. However, the entire response is simply stored in a string so part of the string contains binary chars. Are you saying that it is not possible to convert this "GZip string" into a text string? – Matt Sep 01 '10 at 21:06
  • @Matt: You shouldn't be storing the response in a string to start with. If it's binary, it shouldn't be in text at all, unless it's base64. The concept of "part of the string contains binary data" really doesn't work. It sounds like you need to change your approach. – Jon Skeet Sep 01 '10 at 21:16
  • The response is initially presented as a byte[], so that's all I have available. Could I use this? – Matt Sep 01 '10 at 21:24
  • @Jon Skeet I now have the same problem. Would you recommend storing the response in `byte[]`? – Amir Rachum Apr 25 '11 at 10:15
  • @Amir: I don't know what you're trying to do, so it's hard to say. I suggest you put more context into a new question. – Jon Skeet Apr 25 '11 at 10:17
  • @Jon http://stackoverflow.com/questions/5777503/how-to-store-an-http-response-that-may-contain-binary-data – Amir Rachum Apr 25 '11 at 10:33
1

Ideally you should use a high-level library to handle this stuff for you. That way whenever a new version of HTTP is released, the library maintainer hopefully does all the hard work for you and you just need the updated version of the library.

That aside, it is a nice exercise to try doing it yourself.

Lets assume you are reading an HTTP Response as a stream of bytes from a TCP socket. If there was no gzip encoding, then putting the whole response into a String could work. However the presence of a "Content-Encoding: gzip" header means the response body will (as you noted) be binary.

You can identify the start of the response body as the first byte following the first occurrence of the String sequence "\r\n\r\n" (or the 4 bytes 0x0d, 0x0a, 0x0d, 0x0a).

The gzip encoding has a special header, and you should test the first 3 body bytes for that:

                byte[] buf;  // from the HTTP Response stream
                // ... insert code here to populate buf from HTTP Response stream
                // ...
                int bodyLen = 1234;  // populate this value from 'Content-length' header
                int bodyStart = 123; // index of byte buffer where body starts
                if (bodyLen > 4 && buf[bodyStart] == 0x1f && buf[bodyStart + 1] == (byte) 0x8b && buf[bodyStart + 2] == 0x08) {
                    // gzip compressed body
                    ByteArrayInputStream bais = new ByteArrayInputStream(buf);
                    if (bodyStart > 0) bais.skip(bodyStart);

                    // Decompress the bytes
                    byte[] decompressedBytes = new byte[bodyLen * 4];
                    int decompressedDataLength = 0;
                    try {
                        // note: replace this try-catch with try-with-resources here where possible
                        GZIPInputStream gzis = new GZIPInputStream(bais);
                        decompressedDataLength = gzis.read(decompressedBytes);
                        gzis.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }

The "Not in GZIP format" error is produced by GZIPInputStream if the first 3 bytes do not match the magic GZIP header values, so testing for these will help resolve your particular issue.

There is also a CRC checksum within the GZIP format, however if that is missing or incorrect you should see a different error.

gb96
  • 1,674
  • 1
  • 18
  • 26
0

May be this helps :

try (final GZIPInputStream gzipInput = new GZIPInputStream(new ByteArrayInputStream(compressedByteArray));
        final StringWriter stringWriter = new StringWriter()) {
        org.apache.commons.io.IOUtils.copy(gzipInput, stringWriter, "UTF_8");
        String decodedString = stringWriter.toString();
    } catch (IOException e) {
        throw new UncheckedIOException("Error while decompression!", e);
    }
Abbin Varghese
  • 2,422
  • 5
  • 28
  • 42