asynchronous http request handling with tomcat and spring

Question

It's my first SO question so be patient with me :)

I'm trying to create a service that:

Receives HTTP GET requests containing a URL to query
For a single GET request the service extracts the URL
Queries a local DB about the URL
If a result was found in the DB it will return it to the client and if not it will need to query some external services (that may take relatively long time to respond)
Return the result of the URL to the client

I'm running this on a virtual machine and Tomcat7 with spring. I'll apologize in advance and mention that I'm pretty new to Tomcat

Anyway, I'm expecting a lot of concurrent GET requests to this service (hundreds of thousands of simultaneous requests) What I'm basically trying to achieve is to make this service as scalable as possible (and if that's not possible then at least a service that can handle hundreds of thousands of simultaneous requests)

I've been reading A LOT about asynchronous requests handling in services and especially in Tomcat but I have some things that are still unclear to me:

From the official tomcat website it seems that Tomcat contains number of acceptor threads and number of working threads. If so, why should I use AsyncContext? Whats the benefit of releasing a Tomcat's working thread and occupying a different thread in my application to do the exact same actions? (there's still 1 active thread in the system)
Somewhat similar to the first question but are there any benefits for creating the AsyncContext and using it with a different thread? (a thread from a thread pool created in my application)
Regarding the same issue, I've seen here that I can also return a Callable or a DeferredResult and process it with either one of Tomcat's threads or with one of my own threads. Are there any benefits for returning a Callable or using a DeferredResult over just processing the AsyncContext from the requests?
Also, If I decide to return a callable, from what thread pool does Tomcat gets the thread to process my callable? Are the threads being used here the same working threads from Tomcat that I previously mentioned? If so, what benefits do I get from releasing one Tomcat working thread and using a different one instead?
I've seen from Oracle's documentation that I can pass AsyncContext a Runnable object that will be processed concurrently, From where do the threads used to execute this Runnable come from? Do I have any control over it? Also, any benefits to passing the AsyncContext a Runnable over just passing the AsyncContext to one my threads?

I apologize for asking so many questions regarding the same things but me and my colleagues are arguing over these things for over a week without any concrete answer.

I have 1 more general question: What do you think is the best way to make the service I described scalable? (putting aside adding more machines at the moment), could you post any examples or references for the purposed solution?

I'd post more links of links I've been looking at but my current reputation doesn't allow it. I'll be grateful for any understandable references or for concrete examples and I'll obviously be happy to clarify on any relevant issue

Cheers!

In link you posted, there's 'Motivation' section. Plus, 'one active thread in system' isn't true for multicore systems. — Victor Sorokin, Jan 13 '15 at 15:06
Regarding scalability, current approach, employed by all major services, is exactly to scale using more machines. Of course, prerequisite is that you should try to avoid bottlenecks in your application(s) running on single machine. However, even if single-host app is optimal, you still have scalability limits, imposed by hardware. That's why more machines is better, given your app design with multi-host usage in mind. — Victor Sorokin, Jan 13 '15 at 15:10
Single-host scalability means lots of caching and using all available cores. — Victor Sorokin, Jan 13 '15 at 15:11
In the motivation section it says that if my client needs a result (and my clients need a result) we can decouple processing from the servlet container thread. One of my questions is why? I can either create my own thread pool and use it or I can just increase the number of working threads in Tomcat. What benefits are there for creating my own thread pool instead of increasing the Tomcat's number of threads in its thread pool? — Gideon, Jan 13 '15 at 16:07
Regarding multiple machines, When we'll need to use several machines we'll definitely do it. First we want to make sure that each machine that we create is optimized, only then we'll start thinking about more machines. I'm still in the phase of optimizing the single machine :) Thanks for the comments victor! — Gideon, Jan 13 '15 at 16:10
It'll pay off if you consider multi-host deployment from the start. It's usually requires _very_ substantial refactoring to add this as an after-thought. I, for one, use Hazelcast for multi-host scaling :) — Victor Sorokin, Jan 13 '15 at 16:13
It may be beneficial to re-post this question to 'programmers.stackexchange', since this is a bit too broad for SO :) — Victor Sorokin, Jan 13 '15 at 16:31
Sorry, just seen your comment about decoupling. Reason is that container thread concerned with HTTP I/O, whereas background thread should be concerned with some other task (DB I/O, computation, etc). If you don't decouple, these long ops would be utilizing servlet HTTP threads, decreasing nr of simultaneous clients (browsers) your server can handle. — Victor Sorokin, Jan 13 '15 at 16:54
Victor, if I need more simultaneous clients why can't I just increase the number of acceptors and worker threads in tomcat? If my machine can for example support only 150 threads simultaneously (not saying that it does it's just for example) why don't I just write 150 in the worker threads and allow it to have several acceptors? since these numbers are configurable I fail to see the need for another thread pool (I'm probably wrong just trying to understand why) — Gideon, Jan 13 '15 at 17:14
I'm sorry if it sounds like i'm looking for an argument, it's just that i've read every possible tutorial online with any real answers :) — Gideon, Jan 13 '15 at 17:15
Because tomcat worker threads serve browsers, and not every browser request will be some long-running op (imagine serving static picture or something else not very expensive). Using separate pool you ensure that such quick op won't be delayed by all tomcat threads being used for long running op initiated by some other clients. — Victor Sorokin, Jan 13 '15 at 17:21
I agree with the statement that if I know I have some long running operations and some short running operations I need different pools for the different types of operations. However, in my servlet it seems like all the clients will receive the exact same treatment: query DB, query external service, return result. So in my case, is it still useful to create another threadpool? (I can't predict how fast each operation is going to take - depends if found on DB or not) — Gideon, Jan 13 '15 at 17:35
I've accidentally came across [this](http://stackoverflow.com/questions/7457190/how-are-threads-allocated-to-handle-servlet-request). Does this mean that if I have tomcat7 with spring mvc so each request that arrives to my handleRequest method is actually a thread per client? I was under the impression that tomcat and spring manage the requests differently by accepting requests and putting them in a different thread pool for me to use — Gideon, Jan 13 '15 at 18:47
It's thread per request, not per client. By default, incoming requests are served by worker threads, and no different pool by default. — Victor Sorokin, Jan 13 '15 at 19:49

score 5 · Answer 1 · answered Feb 07 '15 at 05:01

There are a lot of questions packed into this, but I'll try to address some of them.

Asynchronous I/O is a good thing, especially on servers that serve large volumes of requests - it allows you to use fewer threads to process more requests. In the case of a proxy such as you are writing, you really want your HTTP client (that makes the requests to foreign URLs) to be asynchronous as well, so that neither processing the request nor receiving the remote response involves blocking I/O.

That said, you may have a harder time doing this stuff with Tomcat or Java EE servers in general, which have had asynchronous I/O bolted onto them as an afterthought, than using a framework like Netty that is asynchronous from the ground up. As the author of a framework which builds on top of Netty, I'm a bit biased.

To demonstrate how little code you'd need to do what you describe, I wrote a small server that does what you describe here in 3 Java source files and put it on github - it builds a standalone JAR you can run with java -jar to try it out, and I tried to comment it clearly.

What it comes down to is, networked applications spend most of their time waiting for I/O to happen. In the case of a proxy in particular, with traditional, threaded I/O, you would get a request, and the thread that received the request would be responsible for answering it synchronously - that means, if it has to make a network request to another server, that thread is blocked waiting for the answer to come from the remote server. Meaning that thread can't be used for anything else. So, if you have 10 threads, and all of them are waiting on responses, your server can't answer any more requests until one of them finishes and frees up a thread. With asynchronous I/O, you get a callback when some I/O completes. In other words, instead of standing still until the OS flushes your data to the socket and out the network card, your code simply gets a friendly tap on the shoulder when there is something to do (like a response arriving from your proxy request). While your code is waiting for that HTTP request to complete, the thread that sent the proxy request is free to be used to handle another request That means one thread can do a little work on one request, do a little work on another, and another, and eventually finish the first request. Since threads are a finite resource provided by your operating system, this allows you to do a lot more with a lot less hardware.

As to Callable vs. DeferredResult, using a Callable just moves when the work happens around (the Callable gets executed later, on some thread or other, but is still expected to do return a result synchronously); DeferredResult sounds more like what you'd need, since that allows your code to go off and do whatever work it wants, and then set the result (triggering completion of the response) whenever it has something to set.

Honestly, I think if you want to implement this really efficiently, you'd be better off staying away from the Java EE stack - so much of it has baked in assumptions that I/O is synchronous that trying to do async stuff with it is swimming upstream (for example, JDBC has synchronous I/O baked into its bones - if you really want this to scale and you want to use an SQL database, you'd be better off with something like this ).

For another example of using Netty for this sort of thing, see the tiny-maven-proxy project - the code is less pretty, but it shows an example of doing an HTTP proxy where the response body is fed to the client chunk-by-chunk, as it arrives - so you never actually pull the full response body into memory, meaning even requests with huge responses won't run the proxy out of memory. Tiny-maven-proxy also caches on the filesystem. I didn't do those things in the demo because it would have made the code more complicated.

asynchronous http request handling with tomcat and spring

1 Answers1