QUESTION
How do I scale to use more threads if and only if there is free cpu? Something like a ThreadPoolExecutor that uses more threads when cpu cores are idle, and less or just one if not.
USE CASE
Current situation: My Java server app processes requests and serves results. There is a ThreadPoolExecutor to serve the requests with a reasonable number of max threads following the principle: number of cpu cores = number of max threads. The work performed is cpu heavy, and there's some disk IO (DBs). The code is linear, single threaded. A single request takes between 50 and 500 ms to process. Sometimes there are just a few requests per minute, and other times there are 30 simultaneous. A modern server with 12 cores handles the load nicely. The throughput is good, the latency is ok.
Desired improvement: When there is a low number of requests, as is the case most of the time, many cpu cores are idle. Latency could be improved in this case by running some of the code for a single request multi-threaded. Some prototyping shows improvements, but as soon as I test with a higher number of concurrent requests, the server goes bananas. Throughput goes down, memory consumption goes overboard. 30 simultaneous requests sharing a queue of 10 meaning that 10 can run at most while 20 are waiting, and each of the 10 uses up to 8 threads at once for parallelism, seems to be too much for a machine with 12 cores (out of which 6 are virtual).
This seems to me like a common use case, yet I could not find information by searching.
IDEAS
1) request counting One idea is to count the current number of processed requests. If 1 or low then do more parallelism, if high then don't do any and continue single-threaded as before. This sounds simple to implement. Drawbacks are: request counter resetting must not contain bugs, think finally. And it does not actually check available cpu, maybe another process uses cpu also. In my case the machine is dedicated to just this application, but still.
2) actual cpu querying I'd think that the correct approach would be to just ask the cpu, and then decide. Since Java7 there is OperatingSystemMXBean.getSystemCpuLoad() see http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad() but I can't find any webpage that mentions getSystemCpuLoad and ThreadPoolExecutor, or a similar combination of keywords, which tells me that's not a good path to go. The JavaDoc says "Returns the "recent cpu usage" for the whole system", and I'm wondering what "recent cpu usage" means, how recent that is, and how expensive that call is.
UPDATE
I had left this question open for a while to see if more input is coming. Nope. Although I don't like the "no-can-do" answer to technical questions, I'm going to accept Holger's answer now. He has good reputation, good arguments, and others have approved his answer. Myself I had experimented with idea 2 a bit. I queried the getSystemCpuLoad() in tasks to decide how large their own ExecutorService could be. As Holger wrote, when there is a SINGLE ExecutorService, resources can be managed well. But as soon as tasks start their own tasks, they cannot - it didn't work out for me.