Apache HttpClient and HttpConnection in a multithreaded applicatio

Question

In my mutlithreaded application, I send some http requests, to some http servers, I would say 10 servers, 300 different requests per server, about once an hour, nothing too serious.

My question is: should I keep a single HttpClient for all outgoing connections ? Maybe one per unique target server ? or one per "iteration" (it takes about 10 minutes in the beginning of every hour) ?

I'm currently using a single PoolingHttpClientConnectionManager, and HttpClientBuilder.setConnectionManager(connectionManager).build() for every request.

I have a feeling of a real waste of resources, and I also see many connections in ESTABLISHED state per server, though I'm using a pooling connection manager. (The requests for each server are sent one-by-one, and are not concurrent)

Svetlin Zarev · Accepted Answer · 2016-03-14T20:24:40.540

I'm currently using a single PoolingHttpClientConnectionManager, and HttpClientBuilder.setConnectionManager(connectionManager).build() for every request.

Building a new HttpClient for each request is a huge waste. You should use an HttpClient per configuration (each client can have different connection manager, max concurrent requests, etc) or for each independent module of your application (in order to not create dependencies between otherwise independent modules).

Also do not forget that .build() returns a CloseableHttpClient which means that you should call httpClient.close() when you are done using it otherwise you may leak resources.

Update in responde to a comment from @Nati:

what will be "wasted" ? is HttpClient a heavy object ?

Here you can see the source code for the creation of an http client. As you can see it's a lot of code and is pointless to be executed on each request. This unnecessary consumes CPU and creates a lot of garbage which reduces the performance of the whole application. The less allocations you do - the better! In other words there are no benefits from creating new client for each request - only downsides.

does it make any sense of keeping it as a bean for the entire lifespan of the application

IMHO it does, unless it's used very (very) rarely.

relation between the HttpConnection and HttpClient

Each http client can execute multiple http requests. Each request is executed in the context of the client (it's configuration - i.e proxy, concurrency, keep-alive, etc) Each response to a request has to be closed (reset(), close(), don't remember the exact name) in order to free the connection so it can be reused for another request.

Thanks. can you please elaborate on the relation between the HttpConnection and HttpClient ? what will be "wasted" ? is HttpClient a heavy object ? does it make any sense of keeping it as a bean for the entire lifespan of the application (thus avoiding the need to `close()` it? — Nati, Mar 14 '16 at 20:12

score 1 · Answer 2 · answered Mar 14 '16 at 19:13

1

I'd say if its ain't broken don't fix it. What I mean as long as the simplest possible configuration serves your needs use it and do not introduce any complexity just to take care of future scalability needs. Extra parts mean extra complexity and it means more bugs. Once you will see that current configuration no longer holds increased load make an estimation and add resources. I hope this helps

answered Mar 14 '16 at 19:13

Michael Gantman

7,315
2
19
36

1

Yes, `premature optimization is the root of all evil.` Said Knuth. But this is not an optimization. I need best practice and deeper understanding of a significant framework I'm using – Nati Mar 14 '16 at 19:24

score 1 · Answer 3 · edited May 23 '17 at 11:45

I agree with @Michael Gantman on not fixing it.

I would say that fix or not fix depends on your load profile.

Keep or not keep connections?

For example, if you send out 300 request to 10 servers at once, and after that you don't do anything for an hour, then resource-wise it makes no sense to keep any TCP/IP connections opens (because of using HTTP/1.1) for the whole hour.

However if you talk to a server in every 5 seconds, you might consider keeping the connection open. Also, if you want to minimize latencies by eliminating the connection establishment repeatedly, you might consider keeping the connections open.

For that, you have to use HTTP/1.1. You can find lots of examples, e.g. DefaultHttpClient keep alive connection on multiple requests

How many connections to keep?

Again, depends on your load profile. You said you have 10 servers. If you send data for one server serially, then one http connection per server with http/1.1 is totally sufficient. However, if you want to do something more speedy (e.g. uploading two images in parallel), then you can benefit of opening multiple connections against the same server. (Of course this means that your application is multithreaded for real.)

Conclusion

If it is not a time critical application, the easiest thing is to not pool anything just hit the servers when you have data to send. You can start over-optimizing this and fight for 10ms of improvement at a cost of serious accidental complexity.

Actually I'm letting the poolinghttpclientconnectionmanager decide whether to keep the connection open or not. I'm interested in the relation to httpclient mainly — Nati, Mar 14 '16 at 19:55
I am wondering how can it figure out when does it worth of keep a connection or drop a connection. It does not know your load profile, so it can happen to drop some connection - even if it is needed 2 seconds afterwards, or not drop a connection - even if no communication takes place in the next half an hour. I guess you either ignore this alltogether as we suggest, or otherwise have to dig deeper into the details if you want to optimize this fully, no? — Gee Bee, Mar 14 '16 at 19:57
There are best practices. This is a bit too old, but that's the kind of answer I'm looking for. http://stackoverflow.com/q/1281219/1517569 — Nati, Mar 14 '16 at 20:00

score 0 · Answer 4 · answered Mar 14 '16 at 19:28

0

In the HTTP 1.0 header of your client request, you need

Connection: keep-alive

However, that is only a request, and the server you are connecting to might drop the connection anyway.

HTTP 1.1 provides this functionality by default, but the default time-out is pretty short. Perhaps there is some configuration possible there. In any event, if you receive a response with Connection: close in it's header, you must close the connection.

For more details, consult rfc2616, especially section 8 "Persistent connections"

So, it would seem that the proper thing to do is to ensure HTTP 1.1 handling (connections are held open by default with 1.1) and to not do anything "special" with HttpClient. According to the second and third sections of the HttpClient homepage, the client will hold a persistent connection as long as possible by default.

My recommendations are to (if adding a routine / threading / controller on the client connection side) have all the related connections to a specific server / port ordered within the same scope (all on the same thread or ordered in the same order), this will likely decrease the possibility that you run into connection closing logic; but, you cannot really force the connection to stay open (for obvious reasons).

answered Mar 14 '16 at 19:28

Edwin Buck

69,361
7
100
138

Thanks. But if I see a connection in `ESTABLISHED` state on netstat, it means most likely the http session is still alive, no? – Nati Mar 14 '16 at 19:31
BTW, your links refer to `HttpClient 3.1`, which duffer quite a lot from 4.5 I'm using – Nati Mar 14 '16 at 19:35
Not really. You're seeing your client's opinion of the connection status. Often this is correct, but it can be wrong. Basically the only way to really know if an established connection is really usable is to attempt using it. It will fail if it is unusable, and even if it succeeds, it again is the "last known status" of the connection, not a contract. Initial connections can (depending on particulars) sometimes be worse, optimizing to "have the connection" without communication to the server until the first byte is actually transmitted. – Edwin Buck Mar 14 '16 at 19:35
@Nati My documents links might be old, but most of this stuff is in the protocol standards, which are even older than the documents I linked to (well, except for the rfc). Connections cannot be trusted to be "truly open" when you see "established". You must transmit data to it (and read a response) for it to be known to work. Eventually timeouts will make everything align with reality, but the data looks good for some time before you find out it is not useable. – Edwin Buck Mar 14 '16 at 19:39

Apache HttpClient and HttpConnection in a multithreaded applicatio

4 Answers4