49

So, I've come to the conclusion that Apache HttpComponents 4 is one of the most overwrought APIs I've ever come across. Things that seem like they should be simple are taking hundreds of lines of code (and I'm still not sure resources get cleaned up correctly).

Plus it wants me to do things like:

List<NameValuePair> qparams = new ArrayList<NameValuePair>();
qparams.add(new BasicNameValuePair("q", "httpclient"));
qparams.add(new BasicNameValuePair("btnG", "Google Search"));
qparams.add(new BasicNameValuePair("aq", "f"));
qparams.add(new BasicNameValuePair("oq", null));
URI uri = URIUtils.createURI("http", "www.google.com", -1, "/search", 
  URLEncodedUtils.format(qparams, "UTF-8"), null);

Which, just... no. I know it's Java, and we're not into the whole brevity thing, but that's a little much. Not to mention the jars are up to 700KB.

Anyway, enough ranting, I wanted to see what kind of experiences people have had with other HTTP client libraries?

The ones I'm aware of are: Jetty, hotpotato, and AsyncHttpClient.

This is for server-side use, I'm mostly interested in performance for many concurrent gets and large file transfers.

Any recommendations?

PS I know the venerable HttpClient 3.1 is still there, but I'd like to use something that's supported.

Update

@oleg: this is what the docs suggest:

    HttpClient httpclient = new DefaultHttpClient();
    try {
        HttpGet httpget = new HttpGet("http://www.apache.org/");
        HttpResponse response = httpclient.execute(httpget);
        HttpEntity entity = response.getEntity();
        if (entity != null) {
            InputStream instream = entity.getContent();
            try {
                instream.read();
            } catch (IOException ex) {
                throw ex;
            } catch (RuntimeException ex) {
                httpget.abort();
                throw ex;
            } finally {
                try { instream.close(); } catch (Exception ignore) {}
            }
        }
    } finally {
        httpclient.getConnectionManager().shutdown();
    }

I still get unexpected errors when consuming entity content when using ThreadSafeClientConnManager. I'm sure it's my fault, but at this point I don't really want to have to figure it out.

Hey, I don't mean to disparage anyone's work here, but I've been making a good-faith effort to use HttpComponents since 4.0 came out and it's just not working for me.

Dmitri
  • 8,999
  • 5
  • 36
  • 43
  • 1
    While not flawless, have you considered the standard URLConnection/HTTPUrlConnection ? – nos Mar 24 '11 at 08:10
  • 4
    Having to call InputStream#close() to release allocated resources is massively over-complex, isn't it? – ok2c Mar 24 '11 at 08:20
  • I would really like to hear what you have discovered since posting this. I'm in the same boat :) – nash Nov 29 '11 at 17:03
  • Adding a comment since the Jetty http client link mentioned above is so so old, here is the actual client docs. https://www.eclipse.org/jetty/documentation/current/http-client.html – jesse mcconnell Aug 17 '13 at 11:03
  • 3
    Since 4.0, the API has changed with every point release more times than I've changed my underwear – RTF Aug 15 '14 at 09:16
  • 2
    @RTF I don't know you, so I can't tell if the API changes fast, or if I should avoid the smell of your house :D – Joffrey Aug 24 '14 at 15:27
  • 1
    @Joffrey A little from column A, a little from column B – RTF Aug 24 '14 at 15:44

9 Answers9

21

Complexity of HttpClient API simply reflects the complexity of its problem domain. Contrary to a popular misconception HTTP is a fairly complex protocol. Being a low level transport library HC 4.0 API was primarily optimized for performance and flexibility rather than simplicity. It is regrettable that you are not able to figure it out, but so be it. You are welcome to use whatever library that suits your needs best. I personally like Jetty HttpClient a lot. It is a great alternative that might work better for you.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
ok2c
  • 26,450
  • 5
  • 63
  • 71
  • 4
    I agree with both you (+1) and the OP. The power and flexibility is necessary, but there should also a set of facade methods somewhere that simplify the process. Methods like `public static InputStream httpGetAsStream(String baseUrl, Map parameters)` – Sean Patrick Floyd Mar 24 '11 at 09:26
  • 21
    HTTP is complex, but Apache's HttpComponents library *is* ridiculously over-engineered and contains a lot of complexity not essential to protocol operation. It objectively has a really bad API. – Alex B Jun 05 '12 at 01:50
  • 3
    @Alex B: HttpComponents are being used in all sorts of different applications randing from simple URL fetchers to complex transports and web crawlers with different, often conflicting requirements. What may seem as non-essential to some can be absolutely essential to others. HttpClient has to deal with several dozen customization parameters and context specific strategies and objects. So, flexibility has to come before simplicity. For those who are not able to wrap their head around HttpClient API there is fluent facade API: http://hc.apache.org/httpcomponents-client-dev/fluent-hc/index.html – ok2c Jun 06 '12 at 14:54
  • 5
    "Complexity of HttpClient API simply reflects the complexity of its problem domain. " If HC 4.0 was perfect then that would be a good point, but it is not. It has many problems and I imagine a lack of direction lead it into the mess it is today. I think Alex B is right when he says that objectively the API is bad and could have been *much* better. Could have been. – Zombies Nov 20 '12 at 20:27
  • 2
    @SeanPatrickFloyd `there should be a set of façade methods` Indeed there *is* such a set of methods: the [Apache HttpComponents Fluent API](https://hc.apache.org/httpcomponents-client-ga/tutorial/html/fluent.html). (as oleg points out in comment above) – Basil Bourque Mar 18 '14 at 17:21
  • @oleg I am making the switch to Jetty HttpClient and I like it a lot so far, But I can not capture HTTPS traffic in Fiddler. I tried many things but I still get Exception when I proxy HTTPS traffic to Fiddler. This is essential for me as I need it for debugging purposes. I have no problem capturing Apache HttpClient or URLConnection's traffic in Fiddler. Have you or anyone here know what I need to do? – Arya Sep 30 '16 at 07:46
16

For simple use cases you can use HttpClient Fluent API. See tutorials.

This module provides an easy to use facade API for HttpClient based on the concept of a fluent interface. Fluent facade API exposes only the most fundamental functions of HttpClient and is indended for simple use cases that do not require the full flexibility of HttpClient. For instance, fluent facade API relieves the users from having to deal with connection management and resource deallocation

    // Execute a GET with timeout settings and return response content as String.
 Request.Get("http://somehost/")
        .connectTimeout(1000)
        .socketTimeout(1000)
        .execute().returnContent().asString();

Maven artifact.

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>fluent-hc</artifactId>
    <version>4.2.5</version>
</dependency>
Vahe Harutyunyan
  • 652
  • 2
  • 10
  • 17
  • 3
    Update: Nowadays there are plenty of better alternatives. OkHttp and Retrofit are fabulous libraries that are concise and functionally perfect for most modern applications. – forresthopkinsa Jul 20 '17 at 16:13
6

Answering my own question since this got resurrected for some reason.

I ended up writing a few simple wrappers around java.net.HttpURLConnection, seems it's come a long way since the last time I seriously considered it.

Apache HttpComponents is great, but can be overkill for simple tasks. Also, at least in my scenario, HUC is noticeably faster (mostly single-threaded, haven't done any testing under heavy load).

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
Dmitri
  • 8,999
  • 5
  • 36
  • 43
  • 2
    HttpURLConnection instances are significantly cheaper to create than due to the fact that they share one static JRE wide pool of connections. Per default HttpClient always creates a new pool of connections and therefore is slower to start up and warm up. One can address this problem by re-using the same instance of HttpClient for new requests. This approach is used by HttpClient's fluent facade, for isntance: http://hc.apache.org/httpcomponents-client-dev/fluent-hc/index.html. In all other cases I can think of HttpClient should be comfortably faster – ok2c Mar 30 '12 at 07:02
  • 2
    Maybe I'm doing something wrong, this isn't exactly rigorous benchmarking. I do always use a single HttpClient instance. Just ran a quick test: fetching from localhost, very small document (150 bytes) x 5000 times takes roughly 10 seconds with HttpClient (4.1) and under 2 seconds with HUC. This actually isn't just academic, my main use-case is lots of small lookups against services on the same machine. – Dmitri Mar 30 '12 at 09:06
  • 2
    I can only give you my (biased) perspective. Any HTTP performance test that lasts 2 second is simply not representative. One need to be running the benchmark for a few minutes to get more or less reliable numbers. Here is the benchmark and some results that we use internally: http://wiki.apache.org/HttpComponents/HttpClient3vsHttpClient4vsHttpCore. HttpClient 4.x performance seems to be quite all right compared to HUC and other clients. – ok2c Mar 30 '12 at 16:08
  • 2
    Of course. Like I said, it's by no means a comprehensive benchmark. Then again, in _my_ particular case, which is not covered by your benchmark, there is a noticeable difference. I would not generalize from that (certainly isn't representative), but it does factor into my decision of what to use in my scenario. – Dmitri Mar 30 '12 at 17:31
4

Google HTTP Client

Another library is Google HTTP Client Library for Java.

Written by Google, this library is a flexible, efficient, and powerful Java client library for accessing any resource on the web via HTTP. It features a pluggable HTTP transport abstraction that allows any low-level library to be used, such as java.net.HttpURLConnection, Apache HTTP Client, or URL Fetch on Google App Engine. It also features efficient JSON and XML data models for parsing and serialization of HTTP response and request content. The JSON and XML libraries are also fully pluggable, including support for Jackson and Android's GSON libraries for JSON.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
4

jsoup

jsoup is a library designed to parse HTML files. It does make HTTP calls to retrieve the source code of a web page.

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
3

I'm a fan of the client API from JAX-RS (standardized in 2.0) and Jersey's implementation in particular. It supports async and has connectors so that it can be backed by Apache HttpComponents, plain HttpUrlConnection, Jetty or Grizzly.

There are some good examples of usage here, including the following.

client.target(REST_SERVICE_URL)
      .path("/{bookId}")
      .resolveTemplate("bookId", bookId)
      .request()
      .get(Book.class);
Chris H.
  • 2,204
  • 1
  • 19
  • 18
3

You could use Netty or Apache Mina albeit they are very low level and I'm not sure you will end up with less verbose code.

CarlosZ
  • 8,281
  • 1
  • 21
  • 16
1

HTTPUnit has a great interface (not much code needed), but the latest version of it submits duplicate requests.

HTMLUnit will work, but for me it has seemed to have limited support for Javascript. I've been able to use it for basic web pages though.

A B
  • 4,068
  • 1
  • 20
  • 23
1

You could have a look at Restlet's client capabilities. It's a layer above that can be supported by Apache HttpComponents or Java's Net API for example.

Bruno
  • 119,590
  • 31
  • 270
  • 376
  • 1
    I've used Jersey's client, which is conceptually pretty similar. It is pretty convenient for a lot of cases. – Dmitri Mar 30 '12 at 05:25