12

I tried to find any mention of handling of compression in new Java HTTP Client but failed. Is there a built-in configuration to handle for e.g. gzip or deflate compression?

I would expect to have a BodyHandler for e.g. something like this:

HttpResponse.BodyHandlers.ofGzipped(HttpResponse.BodyHandlers.ofString())

but I don't see any. I don't see any configuration in HttpClient either. Am I looking in the wrong place or was this intentionally not implemented and deferred to support libraries?

Bobulous
  • 12,967
  • 4
  • 37
  • 68
Krzysztof Krasoń
  • 26,515
  • 16
  • 89
  • 115
  • Have you tryed looking at the network log? If the client attaches the header `Accept-Encoding: gzip` it supports it. Note that there is often a difference of headers you get on the application side and on the network side of the http client. – Patrick Nov 27 '18 at 16:02
  • 1
    Going through some documentation felt this might be related to the question [HPACK (Header Compression for HTTP/2) implementation](https://bugs.openjdk.java.net/browse/JDK-8153353). Details over [Indexing Tables used in compression](https://httpwg.org/specs/rfc7541.html#indexing.tables) does mention both your sample compressions header in the [Appendix](https://httpwg.org/specs/rfc7541.html#static.table.definition). – Naman Nov 27 '18 at 16:22
  • @patrickf actually I added such header and was surprised that I got uncompressed content – Krzysztof Krasoń Nov 27 '18 at 20:30
  • 1
    @RomainHippeau how is it a duplicate? The question you linked is about apache http client and mine is about Java http client (the one embedded in Java 11), notice the tags. – Krzysztof Krasoń Nov 27 '18 at 20:34

3 Answers3

17

I was also surprised that the new java.net.http framework doesn't handle this automatically, but the following works for me to handle HTTP responses which are received as an InputStream and are either uncompressed or compressed with gzip:

public static InputStream getDecodedInputStream(
        HttpResponse<InputStream> httpResponse) {
    String encoding = determineContentEncoding(httpResponse);
    try {
        switch (encoding) {
            case "":
                return httpResponse.body();
            case "gzip":
                return new GZIPInputStream(httpResponse.body());
            default:
                throw new UnsupportedOperationException(
                        "Unexpected Content-Encoding: " + encoding);
        }
    } catch (IOException ioe) {
        throw new UncheckedIOException(ioe);
    }
}

public static String determineContentEncoding(
        HttpResponse<?> httpResponse) {
    return httpResponse.headers().firstValue("Content-Encoding").orElse("");
}

Note that I've not added support for the "deflate" type (because I don't currently need it, and the more I read about "deflate" the more of a mess it sounded). But I believe you can easily support "deflate" by adding a check to the above switch block and wrapping the httpResponse.body() in an InflaterInputStream.

Bobulous
  • 12,967
  • 4
  • 37
  • 68
  • 2
    It's a good answer, however at least once I met a website which returned Content-Encoding : gzip, but actually there was no gzip encoding involved in the body. And this kind of code would throw an exception. To deal with it I used HttpResponse using BodyHandlers.ofByteArray() and if Content-Encoding is set to gzip trying to new GZIPInputStream(new ByteArrayInputStream(bytes)) and if I get the exception, simply use that byte[] as raw data. It's less efficient, but for me it was critical, since I have had no control over the website with the wrong encoding, but needed to use it. – Kivan Jun 14 '21 at 15:04
  • @Kivan I had this issue at some point too, I remember I developed a workaround to discover the actual payload content using a `java.io.PushbackInputStream`, so it's easy to check the [Gzip payload header](https://en.wikipedia.org/wiki/Gzip) (`1F 8B 08`), pushback and either wrap in `GZIPINputStream` or not. – bric3 Aug 09 '22 at 16:25
7

You can use Methanol. It has decompressing BodyHandler implementations, with out-of-the-box support for gzip & deflate. There's also a module for brotli.

var response = client.send(request, MoreBodyHandlers.decoding(BodyHandlers.ofString()));

Note that you can use any BodyHandler you want. MoreBodyHandlers::decoding makes it seem to your handler like the response was never compressed! It takes care of the Content-Encoding header and all.

Better yet, you can use Methanol's own HttpClient, which does transparent decompression after adding the appropriate Accept-Encoding to your requests.

var client = Methanol.create();
var request = MutableRequest.GET("https://example.com");
var response = client.send(request, BodyHandlers.ofString()); // The response is transparently decompressed
4

No, gzip/deflate compression are not handled by default. You would have to implement that in your application code if you need it - e.g. by providing a customized BodySubscriber to handle it. Alternatively - you may want to have a look at whether some of the reactive stream libraries out there offer such a feature, in which case you might be able to pipe that in by using one of the BodyHandlers.fromSubscriber​(Flow.Subscriber<? super List<ByteBuffer>> subscriber) or BodyHandlers.ofPublisher() methods.

daniel
  • 2,665
  • 1
  • 8
  • 18
  • It's a pity, considering that there is already a GzipInput/OutputStream in the standard library. – Krzysztof Krasoń Nov 29 '18 at 15:56
  • 1
    Right. Though using Input/OutputStream would force you back to synchronous mode when pulling the request bytes. Maybe you could use `BodyPublishers.ofInputStream(..)` and `BodySubscribers.ofInputStream()` with some combination of the PipedInput/OutputStream and GzipInput/OutputStream too - but then you'd still have to pull the request bytes. – daniel Nov 29 '18 at 17:16
  • I did try the `BodySubscriber` approach (see [this question](https://stackoverflow.com/questions/53379087/wrapping-bodysubscriberinputstream-in-gzipinputstream-leads-to-hang)) but it led to a total hang. So instead I went with the less glamorous approach which I've described in my answer to @KrzysztofKrasoń and that works fine. Frustrating, though. – Bobulous Jan 11 '19 at 17:13
  • Is there an example to refer to using this approach while writing custom BodyHandler? – hemu Aug 31 '21 at 14:00