2

I am in the process of writing a web-app that uses multiple web APIs. For a single request of a single user, my app may need to perform up to 30 HTTP requests to other sites. The site housing the web-app can have hundreds of concurrent users.

I've been looking around trying to figure out which library should I use. I'm looking for a mature project that has detailed documentation and tested code, one that will still be around in years to come. Not sure if something like that exists (!)

Couple of questions :

  1. In a case such as described above, should I be using an asynchronous HTTP client (without threading), or a regular (possibly pooled) HTTP client (with threading)? Asynchronicity relieves my app from using threads, but makes the code more scattered - will the above mentioned number of requests burden my server too much? (it says here that asynchronous is more scalable)

  2. Which library is the common one to use? Is it Apache HttpComponenets HttpClient or its asynch couterpart HttpAsynchClient - which is in Alpha...)? How about jfarcand's AsyncHttpClient?


Okay, let's say I will use threads. After digging around I realize that spawning threads from within a servlet (in my case - a Struts action), may be a big No No :

related questions:

What is recommended way for spawning threads from a servlet in Tomcat

Need help with java web app design to perform background tasks

Can i spawn a thread from a servlet ?

The way I see it, these are my options:

  1. use my own thread pool (container doesn't manage my threads)
  2. use a WorkManager such as CommonJ (seems like an inactive product)
  3. use a 3rd party scheduler such as Quartz (may be an overkill ... ?)

I would appreciate any recommendations for this specific use case - aggregating lotsa data from different web services (this aggregation is invoked by a single user's single request).

Community
  • 1
  • 1
bloodcell
  • 601
  • 1
  • 9
  • 23

4 Answers4

1

Good question. I would try an asynchronous solution first to see how everything works. The asynchronous solution would be the simplest to implement.

If that doesn't work, try a more threaded model.

I would use HttpClient for making your requests. I've worked with it a lot and use it for any http work that I have to do.

hooknc
  • 4,854
  • 5
  • 31
  • 60
  • At 30 requests for hundreds of users, async is the way to go, else you will have lots of threads just blocking for a result if you use synchronous (blocking) calls on threads. – David d C e Freitas Apr 19 '11 at 17:42
  • @David, I'm in agreement with you. I think asynchronous is the way to go in the long run. In fact after giving some thought to this problem some more, I would consider having the calls to the remote sites as part of a persistence layer/store that could cache the results from those external sites. The persistence layer/store could then update the cache based on a specific strategy. We don't know what data is required from each site and how that data is rehashed and then displayed on this particular site. Perhaps using asynchronous calls from the client via javascript is the best plan here. – hooknc Apr 19 '11 at 18:07
0

A single thread for each remote http connection, and using a synchronous http client will probably be easier. I would try this approach first, and see if it is fast/scalable enough. For the synchronous approach, apache http client is a good choice.

If a synchronous solution is not good enough, something like netty may be a good fit. It uses NIO so you won't get thousands of threads.

sbridges
  • 24,960
  • 4
  • 64
  • 71
0

I do not know of any existing software to do this for you that will not be overkill. But you might try splitting things up. That is, separate the fetching of the data of the showing of the result. Since you do not provide no further details on the problem at hand I cannot say for you whether that would be feasible or not.

Basically the idea is to create a service that will fetch those 30 subsequent requests for you and if possible process it into a request. The client of this service is the service that is running on the web. It will receive a request from a user and subsequently put it's own request trough to your data service. When the data service is ready it will return it's response. Either synchronously or asynchronously.

You could program your data service in any language you want, even Java, without being bound to servlets ensuring that fetching the subsequent 30 request and combining them in a response isn't being done by the webserver. This could also enhance the responsiveness of your web server itself.

Nutshell: branch of the ``difficult" tasks to a specialised service where you can transparently handle parallelism.

Alessandro Vermeulen
  • 1,321
  • 1
  • 9
  • 28
0

I would use Jetty and in the servlets I'd use the continuation mechanism to free up the thread while waiting for the third party web request to complete. This will allow maximum concurrency on your server, as you can have a lot more suspended requests than threads.

You can use either continuations or the servlet 3.0 asynchronous API, the end result is the same.

Jim Morris
  • 2,870
  • 24
  • 15