Java concurrency based on available FREE cpu

Question

QUESTION

How do I scale to use more threads if and only if there is free cpu? Something like a ThreadPoolExecutor that uses more threads when cpu cores are idle, and less or just one if not.

USE CASE

Current situation: My Java server app processes requests and serves results. There is a ThreadPoolExecutor to serve the requests with a reasonable number of max threads following the principle: number of cpu cores = number of max threads. The work performed is cpu heavy, and there's some disk IO (DBs). The code is linear, single threaded. A single request takes between 50 and 500 ms to process. Sometimes there are just a few requests per minute, and other times there are 30 simultaneous. A modern server with 12 cores handles the load nicely. The throughput is good, the latency is ok.

Desired improvement: When there is a low number of requests, as is the case most of the time, many cpu cores are idle. Latency could be improved in this case by running some of the code for a single request multi-threaded. Some prototyping shows improvements, but as soon as I test with a higher number of concurrent requests, the server goes bananas. Throughput goes down, memory consumption goes overboard. 30 simultaneous requests sharing a queue of 10 meaning that 10 can run at most while 20 are waiting, and each of the 10 uses up to 8 threads at once for parallelism, seems to be too much for a machine with 12 cores (out of which 6 are virtual).

This seems to me like a common use case, yet I could not find information by searching.

IDEAS

1) request counting One idea is to count the current number of processed requests. If 1 or low then do more parallelism, if high then don't do any and continue single-threaded as before. This sounds simple to implement. Drawbacks are: request counter resetting must not contain bugs, think finally. And it does not actually check available cpu, maybe another process uses cpu also. In my case the machine is dedicated to just this application, but still.

2) actual cpu querying I'd think that the correct approach would be to just ask the cpu, and then decide. Since Java7 there is OperatingSystemMXBean.getSystemCpuLoad() see http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad() but I can't find any webpage that mentions getSystemCpuLoad and ThreadPoolExecutor, or a similar combination of keywords, which tells me that's not a good path to go. The JavaDoc says "Returns the "recent cpu usage" for the whole system", and I'm wondering what "recent cpu usage" means, how recent that is, and how expensive that call is.

UPDATE

I had left this question open for a while to see if more input is coming. Nope. Although I don't like the "no-can-do" answer to technical questions, I'm going to accept Holger's answer now. He has good reputation, good arguments, and others have approved his answer. Myself I had experimented with idea 2 a bit. I queried the getSystemCpuLoad() in tasks to decide how large their own ExecutorService could be. As Holger wrote, when there is a SINGLE ExecutorService, resources can be managed well. But as soon as tasks start their own tasks, they cannot - it didn't work out for me.

Variation on 1: use ThreadPool's `getActiveCount()` to measure current pressure. A counter is still needed for requests that (are going to) run multi-threaded as too many of these in the (task-queue of the) ThreadPool will result in bananas. As a precaution you could regurarly (every 5 seconds or so) check on system load in case some (unexpected) heavy scheduled process kicks in (e.g. backup). — vanOekel, Jun 25 '14 at 14:10
I'm also thinking about your idea 2). One of the main challenges is the latency of com.sun.management.OperatingSystemMXBean#getSystemCpuLoad(). I did run a little test which shows that the latency of this method is on my Windows 7 box using Java 8 update 45 **500 ms**. This is half a second... IMHO this is way to slow to decide at runtime wheter increase or decrease e.g. the number of threads of a ThreadPoolExecutor. — Peti, Jul 22 '15 at 06:45
Instead of OperatingSystemMXBean#getSystemCpuLoad() you could use hyperic sigar CpuPerc#getCombined() (Apache 2 licence). The latency is a little bit better: Around **50 ms** on my Windows 7 box using Java 8 update 45. — Peti, Jul 22 '15 at 07:00

score 4 · Accepted Answer · answered Jun 26 '14 at 11:58

There is no way of limiting based on “free CPU” and it wouldn’t work anyway. The information about “free CPU” is outdated as soon as you get it. Suppose you have twelve threads running concurrently and detecting at the same time that there is one free CPU core and decide to schedule a sub-task…

What you can do is limiting the maximum resource consumption which works quite well when using a single ExecutorService with a maximum number of threads for all tasks.

The tricky part is the dependency of the tasks on the result of the sub-tasks which are enqueued at a later time and might still be pending due to the the limited number of worker threads.

This can be adjusted by revoking the parallel execution if the task detects that its sub-task is still pending. For this to work, create a FutureTask for the sub-task manually and schedule it with execute rather than submit. Then proceed within the task as normally and at the place where you would perform the sub-task in a sequential implementation check whether you can remove the FutureTask from the ThreadPoolExecutor. Unlike cancel this works only if it has not started yet and hence is an indicator that there are no free threads. So if remove returns true you can perform the sub-task in-place letting all other threads perform tasks rather than sub-tasks. Otherwise, you can wait for the result.

At this place it’s worth noting that it is ok to have more threads than CPU cores if the tasks accommodate I/O operations (or may wait for sub-tasks). The important point here is to have a limit.

FutureTask<Integer> coWorker = new FutureTask<>(/* callable wrapping sub-task*/);
executor.execute(coWorker);

// proceed in the task’s sequence

if(executor.remove(coWorker)) coWorker.run();// do in-place if needed
subTaskResult=coWorker.get();

// proceed

score 0 · Answer 2 · edited May 23 '17 at 11:44

It sounds like the ForkJoinPool introduced in Java 7 would be exactly what you need. The ForkJoinPool is specifically designed to keep all your CPUs exactly busy meaning that there are as many threads as there are CPUs and that all those threads are also working and not blocking (For the later make sure that you use ManagedBlockers for DB queries).

In a ForkJoinTask there is the method getSurplusQueuedTaskCount for which the JavaDoc says "This value may be useful for heuristic decisions about whether to fork other tasks." and as such serves as a better replacement for your getSystemCpuLoad solution to make decisions about task decompositions. This allows you to reduce the number of decompositions when system load is high and thus reduce the impact of the task decomposition overhead.

Also see my answer here for some more indepth explanation about the principles of Fork/Join-pools.

Java concurrency based on available FREE cpu

QUESTION

USE CASE

IDEAS

UPDATE

2 Answers2