2

I'm trying to understand the best approach to manage http connections using Apache HttpClient in an implementation (not mine). I think the way it is implemented right now is a waste, but since I'm not that familiar with this library, i'd like to confirm my thoughts.

Consider this scenario:

  • I have a webapp in a tomcat, meaning is a multi-thread environment.
  • I need to reach a Rest WebService from class Rest. a new instance of Rest class is created by each request to my app, in order to call the service I need.

Option 1 (Currently implemented): class Rest instantiates a new PoolingHttpClientConnectionManager and makes the request.

Personally, I think this is a total waste. Only one thread access the instance of the manager at a time. So, there's no really a benefit on this approach. It's actually worst since I'm assuming this manager could be expensive to create (?). So, in reality we end up creating multiple PoolingHttpClientConnectionManager, one per thread (one per request).

Option 2: class Rest could instantiate only one PoolingHttpClientConnectionManager as a sort of singleton.

Then each thread from tomcat would reuse the same connection manager, and only new httpClient would be created per thread. I think this get all the benefits from the pool, like controlling the amount of connections and reuse. But I don't know if this is a good use for the manager (My guess is that it should be Ok, as the whole purpose of this connection manager is to work on multi thread environments).

Option 3: class Rest could instantiate one new instance of BasicHttpClientConnectionManager.

I tried this and work just fine. This means each thread will have their own single connection manager. Even though this manager has a single connection, because we have one manager per thread, we achieve parallel executions.

I think the downside of this approach is that there are no limitations. So, if my app gets too many requests, we create a new manager everytime, plus, we wouldn't be reusing connections to the same route.

I'd appreciate any thoughts you can give me on this matter. I've seen a lot of examples, but always simple examples from a main function, and creating threads explicitly. Didn't see any example from an application server such as tomcat.

Gabriel Espinel
  • 358
  • 2
  • 9
  • Option 2. You want connections shared as widely as possible. Multiple pools just waste connections, as the idle ones aren't shared. – user207421 Jun 13 '19 at 01:30

3 Answers3

2

Option 2 is strongly recommended.

ok2c
  • 26,450
  • 5
  • 63
  • 71
1

As per Apache Commons HTTP Client Documentation option 2 is the most sensible one.

First, it says:

The process of establishing a connection from one host to another is quite complex and involves multiple packet exchanges between two endpoints, which can be quite time consuming. The overhead of connection handshaking can be significant, especially for small HTTP messages. One can achieve a much higher data throughput if open connections can be re-used to execute multiple requests.

HTTP/1.1 states that HTTP connections can be re-used for multiple requests per default. HTTP/1.0 compliant endpoints can also use a mechanism to explicitly communicate their preference to keep connection alive and use it for multiple requests. HTTP agents can also keep idle connections alive for a certain period time in case a connection to the same target host is needed for subsequent requests. The ability to keep connections alive is usually refered to as connection persistence. HttpClient fully supports connection persistence.

So, after that paragraph, we can conclude that yes it is a very bad idea to instantiate HTTP connections every time we want to make an HTTP request and what you call option 1 in your question is not the best way to go.

And later under "Pooling connection manager" it says:

PoolingHttpClientConnectionManager is a more complex implementation that manages a pool of client connections and is able to service connection requests from multiple execution threads. Connections are pooled on a per route basis. A request for a route for which the manager already has a persistent connection available in the pool will be serviced by leasing a connection from the pool rather than creating a brand new connection.

So, after reading this paragraph we can conclude that yes, it makes sense to have a single connection pool shared by all threads of the application. So, ideally, you instantiate it once and share it everywhere you need to obtain an HTTP connection.

Finally, regarding option 3, the documentations says:

BasicHttpClientConnectionManager is a simple connection manager that maintains only one connection at a time. Even though this class is thread-safe it ought to be used by one execution thread only. BasicHttpClientConnectionManager will make an effort to reuse the connection for subsequent requests with the same route. It will, however, close the existing connection and re-open it for the given route, if the route of the persistent connection does not match that of the connection request. If the connection has been already been allocated, then java.lang.IllegalStateException is thrown.

So, option 3 makes sense, but definitively this does not sound better than option 2 in terms of reusing expensive resources.

Edwin Dalorzo
  • 76,803
  • 25
  • 144
  • 205
0

Actually I just read about a question related to this but in c# and I'm not a pro regarding to this matter, basically option 2 is recommended. Creating a new connection manager per connection can result in poor in performance because it creates a new instance for just a new connection (it might only just a request connection but not actually use it and it will exhaust the httpclient manager). And that reason will be enough to pick in your option. Here's the link to the thread link...

Hope this help.

Francis G
  • 1,040
  • 1
  • 7
  • 21