1

I am trying to get the basics very strong about thread pool. I learnt that it internally uses blocking queue to 'steal' tasks and run them into given threads in pool. Meaning that if I had 10 tasks and 5 threads, it could run only 5 tasks at the same time, until 1 finishes ENTIRELY.

Question is: Why not concurrently? Why not just time-slice those 10 tasks? What is the reason of this implementation?

Ana Maria
  • 475
  • 2
  • 11
  • 1
    I don't think you're right about that. A thread pool is just a pool of instantiated threads. When a thread is released, it is returned to the pool so that another task can use it without incurring the overhead of spinning up a brand new thread, and that's all it does. – Robert Harvey Aug 17 '20 at 14:26
  • It sounds like you're thinking of the `ForkJoinPool` (a specific type of thread pool in Java), instead of the more general concept of a thread pool. – Mark Rotteveel Aug 17 '20 at 14:29
  • You are right Robert 100%. But as you said 'so that another task can use it', which is my question. Why does it wait for tasks to finish? Why not run 10 tasks concurrently without waiting for each task to finish? That could cause starvation.. – Ana Maria Aug 17 '20 at 14:30
  • @RobertHarvey maybe the OP talks about fixed thread pool, rather than any other. – gthanop Aug 17 '20 at 14:30
  • I am talking about fixed pool Mark, should had mentioned it.. – Ana Maria Aug 17 '20 at 14:30
  • The task stealing (or work stealing) you're talking about only occurs in the `ForkJoinPool` though. – Mark Rotteveel Aug 17 '20 at 14:31
  • @AnaMaria What's the point of running all of them at the same time? What if you have 10k tasks and only 5 threads? Should all tasks be started? What if new tasks are added in the meantime? Should they be started as well? Something like that would lead to starvation. – Amongalen Aug 17 '20 at 14:32
  • 6
    The "only run 5 at a time" is _the policy of that thread pool_. The thread pool is used precisely to control how much load is allowed to run concurrently, and the rest of the application can submit as many work items as needed without having to worry about that. – chrylis -cautiouslyoptimistic- Aug 17 '20 at 14:34
  • @Amongalen Good point, but why then CPU cores don't schedule 10k threads in the way you say? It does so concurrently and not task by task. And it causes exact that problem. So I am wondering why different behaviours in raw threads and pool threads. – Ana Maria Aug 17 '20 at 14:40
  • @chrylis-cautiouslyoptimistic- Yes! But why? Couldn't CPU make raw threads behave that way as well? Why making it different? – Ana Maria Aug 17 '20 at 14:42

3 Answers3

5

Why not concurrently? Why not just time-slice those 10 tasks?

You can have a thread pool that is able to perform ten concurrent tasks. You just need to configure it to have at least ten worker threads. "Time-slicing" tasks is what threads do. What thread pools do is:

  • Allow your program to control the number of threads that it uses to perform "background" tasks, and
  • Allow your program to re-use threads, which can be much more efficient than creating a new thread for each new task, and then destroying the thread when the task is complete.
Solomon Slow
  • 25,130
  • 5
  • 37
  • 57
  • So if I have 1 core and 4 threads in pool with assigned with 10 tasks, would this happen: threads are assigned 4 threads at same time, but are they doing them concurrently or 1 by 1 now? Knowing there is only 1 physical core? Please help me understand this part.. – Ana Maria Aug 17 '20 at 16:48
  • I mean, can there be time-slicing in that case? Can those 4 threads get suspended in the middle of a work or until one thread finishes one task, does the second pool thread get chance on CPU? – Ana Maria Aug 17 '20 at 16:50
  • 2
    @AnaMaria, I think you should try to understand thread _pools_ as separate from _threads_. The simplest thread pool is just a blocking queue of tasks, and a collection of identical worker threads. Each worker loops forever, waiting to take a task from the queue, performing the task, and then returning to wait for another. More sophisticated thread pools do the same thing, but also provide means to cancel tasks, to shut-down the pool, to kill workers when demand is low, create new ones when demand is high, etc. The underlying thread machinery does not know or care that a pool is using it. – Solomon Slow Aug 17 '20 at 17:33
  • 2
    ...Talking just about _threads_ (pool threads being treated no differently from any other threads), "time-slicing" means that when the number of threads that are "ready to run" is greater than the number of CPUs available to run them, then the operating system _scheduler_ will periodically suspend a thread that has used up its turn on a CPU, and let a different thread have the CPU for a turn. The "turns" could be anywhere from a thousandth of a second each to a tenth of a second each, possibly depending on the whims of the system administrator. – Solomon Slow Aug 17 '20 at 17:36
  • Thanks it helped a lot! – Ana Maria Aug 17 '20 at 22:16
3

In order to "time-slice 10 tasks", those tasks need to be in 10 separate threads that run concurrently.

The time-slicing scheduling algorithm is implemented by the operating system, not by Java. Time slicing applies to threads in Java because Java threads are implemented as native operating system threads: every Java thread has a native thread of its own, and these threads are scheduled by the operating system as it sees fit.

There is no difference between "thread pool threads" and "raw threads" here. If you give an instance of Runnable to a thread (whether it's part of a thread pool or not) it will run from beginning to end, subject to the time slicing scheduling algorithm of the operating system.

So why not use thousands of threads, why even bother with thread pools? It turns out that operating system threads are a relatively expensive and scarce resource, and therefore so are Java threads.

Since operating system threads are so expensive, Project Loom is investigating adding lightweight user space threads to Java. Some of the details in this answer may change when/if loom gets merged into main stream Java.

Joni
  • 108,737
  • 14
  • 143
  • 193
1

Some good answers but I thought I'd respond to your questions specifically.

I learnt that it internally uses blocking queue to 'steal' tasks and run them into given threads in pool. Meaning that if I had 10 tasks and 5 threads, it could run only 5 tasks at the same time, until 1 finishes ENTIRELY.

If you configure your thread pool to have 5 threads (Executors.newFixedThreadPool(5)) then it will start 5 threads to run your jobs. Initially 5 jobs are given to the 5 threads to run concurrently (if your server has 5 CPUs available). Once one of the 5 jobs finishes, a 6th job will be immediately started on the idle thread. This continues until all 10 jobs have been run.

Question is: Why not concurrently? Why not just time-slice those 10 tasks? What is the reason of this implementation?

You can instead use a cached thread pool (Executors.newCachedThreadPool()) if you want which will start a thread for each of the 10 jobs that you submit concurrently. This might work fine for 10 jobs but won't work well with 100,000 jobs – you would not want to start 100,000 threads. We use a fixed thread pool when we want to limit the number of jobs that run concurrently. Even though it seems like running 5 jobs concurrently would always run slower than running 10 jobs at once, but this isn't necessarily the case. There is a cost when the OS time slices between the jobs and the overall job throughput may be faster with 5 threads than 10 depending on how many processors your hardware has. Limiting the number of concurrent jobs also does not stress your server as much and should make your application work better with other running applications.

See my answer here about scaling threads: Concept behind putting wait(),notify() methods in Object class

Gray
  • 115,027
  • 24
  • 293
  • 354