4

Let's imagine that we have n independent blocking IO tasks e.g. tasks for rest-call to another server. Then all answer we need to combine. Every task can be processing over 10 second.

  1. We can process it sequentially and spent ~n*10 second at the end:

    Task1Ans task1 = service1.doSomething();
    Task2Ans task2 = service2.doSomething()
    ...
    return result;
    
  2. Another strategy is to process it in parallel manner using CompletableFuture and spent ~ 10 second on all task:

    CompletableFuture<Task1Ans> task1Cs = CompletableFuture.supplyAsync(() -> service1.doSomething(), bestExecutor);
    CompletableFuture<Task2Ans> task2Cs = CompletableFuture.supplyAsync(() -> service2.doSomething(), bestExecutor);
    return CompletableFuture.allOf(task1Cs, task2Cs)
       .thenApply(nothing -> {
           ...
           // combine task1, task2 into result object
           return result;
       }).join();
    

The second approach has benefits, but I can't understand which type of thread pool is the best for this kind of task:

ExecutorService bestExecutor = Executors.newFixedThreadPool(30)   /// or Executors.newCachedThreadPool() or Executors.newWorkStealingPool()

My question is which ExecutorService is best for process n-parallel blocking IO tasks.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
theSemenov
  • 389
  • 4
  • 17

4 Answers4

5

On completely CPU bound tasks you do not get additional performances by going with more threads than CPU cores. So in this scenario, 8 core / 8 thread CPU needs only 8 thread to maximize performances, and loses performance by going with more. IO tasks usually do gain performances by going with larger number of threads than CPU cores, as CPU time is available to do other stuff while waiting for IO. But even when CPU overhead of each thread is low there are limits to scaling as each thread eats into memory, and incurs caching/context switches..

Given that your task is IO limited, and you didn't provide any other constraints, you should probably just run different thread for each of your IO tasks. You can achieve this by either using fixed or cached thread pool.

If the number of your IO tasks is very large (thousands+), you should limit the maximum size of your thread pool, as you can have such thing as too many of threads.

If your task are CPU bound, you should again limit thread pool to even smaller size. Number of cores can be dynamically fetched by using:

int cores = Runtime.getRuntime().availableProcessors();

Also, just as your CPU has scaling limit, your IO device usually has a scaling limit too. You should not exceed that limit, but without measuring it is hard to say where limit is.

Talijanac
  • 1,087
  • 9
  • 20
  • Or use *less* than all cores, as the JVM, OS, utilities, monitoring apps, file system, etc. need cores for their execution. – Basil Bourque May 05 '22 at 18:24
  • Ok, but which the best choice for IO tasks. fixedThreadPool or cachedThreadPool. My sugestion is: fixedThreadPool - is preffer choice for large number of IO task. We neead to restrict our load. cachedThreadPool - is preffer for apps where that IO tasks can apper rarely(but still nead get result as faster as we can). And we don't nead to keep many not used thread. Am I right or am I mistaken somewhere? – theSemenov May 05 '22 at 18:31
  • 1
    cachedThreadPool is perfectly fine as long as you aware of behavior of your IO code. – Talijanac May 05 '22 at 19:52
  • Sometimes system is more parallel in IO than in CPU. Like having a large DB and small java server in front. In those systems you can increase your throughput while being completely CPU limited. Like running one more query while server CPU is at 100%. The price you pay is prolonged latency at server. It is not going to be responsive, but the system overall will continue to increase throughput with additional queries. So the "right" option is mostly about what you got and what you want to achieve. Often parallelization of code is memory constrained. More tasks require more memory. – Talijanac May 05 '22 at 20:01
  • Now I notice that your Answer focuses on CPU-bound tasks. But that is not the case in the Question. The Question clearly states that these tasks are *not* CPU-bound, they are making REST calls to web services. – Basil Bourque May 06 '22 at 03:43
  • Post does not focuses on CPU bounded task. It focuses on bounded tasks. All unconstrained perfectly scaling IO (implied in question) will be eventually CPU bounded even if IO itself is perfectly scalable and instantaneous. As CPU overhead of parallelism is "serial part" of task and eventually ends up limiting you. And that is why you can't ignore it. – Talijanac May 06 '22 at 06:18
4

Project Loom

Your situation is suited to using the new features being proposed for future versions of Java: virtual threads and structured concurrency. These are part of Project Loom.

Today’s Java threads are mapped one-to-one onto host operating system threads. When Java code blocks, the host thread blocks. The host OS threads sits idle, waiting for execution to resume. Host OS threads are heavyweight, costly in terms of both CPU and memory. So this idling is not optimal.

In contrast, virtual threads in Project Loom are mapped many to one onto the host OS thread. When code in a virtual thread blocks, that task is “parked”, set aside to allow another virtual thread’s task some execution time. This parking of virtual threads is managed within the JVM, so it is highly optimized, very fast, very efficient both in CPU and in memory. As a result, Java apps running on common hardware can support thousands, even millions, of virtual threads at a time.

The ExecutorService is AutoCloseable in Loom. So we can use try-with-resources to contain your entire batch of tasks in a try ( ExecutorService es = Executors.newVirtualThreadPerTaskExecutor() ) { … submit tasks … }. Once completed, the flow of control exits from the try-with-resources block, and you know your tasks are done. Access the Future object returned for each task you submitted. No need for CompletableFuture.

Loom features are now being previewed and incubated in Java 19 and 20. The virtual threads feature is planned for release in Java 21.

For more info, see the several articles, presentations, and interviews with members of the Project Loom team. These include Ron Pressler and Alan Bateman. And see relevant episodes of JEP Café.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
0

If I understand your question properly, for above behaviour, irrespective of selection of executorService, it is more important how you are calling your executorService.

E.g.

ExecutorService executorService=Executors.newCachedThreadPool();
executorService.invokeAll(..);

Now here, invokeAll(..) will block until all supplied tasks inside are completed. So I feel selecting any ExecutorService & calling invokeAll(..) will be suitable for your requirement.

Also please have a look at this SE Question which discusses new Java 8 introduction of ExecutorCompletionService & invokeAll.

Ashish Patil
  • 4,428
  • 1
  • 15
  • 36
0

I found the optimal solution for this kind of task. All I nead to find the solution is to look at implementation of Executors.newCachedThreadPool() or Executors.newFixedThreadPool(30)

   public static ExecutorService newCachedThreadPool() {
        return new ThreadPoolExecutor(0, Integer.MAX_VALUE,
                                      60L, TimeUnit.SECONDS,
                                      new SynchronousQueue<Runnable>());
    }

My decision is to instantiate ThreadPoolExecutor directly and set upper bound of threads that can be created by thread pool. And set timeout after unused threads can be terminated

int nThread = 90;
long timeoutSec = 120;
ThreadFactory threadFactory = new ThreadFactoryBuilder()
                .setNameFormat("Executor-Worker-%d")
                .setDaemon(true)
                .build();
Executor delegate = new ThreadPoolExecutor(
    0,  // min number of thread in pool
    nThread, // max number of thread in pool
    timeoutSec, // terminate idle thread after
    TimeUnit.SECONDS,
    new SynchronousQueue<Runnable>(),
    threadFactory
);
theSemenov
  • 389
  • 4
  • 17