37

What is the difference between-

newSingleThreadExecutor vs newFixedThreadPool(20)

in terms of Operating System and Programming point of view.

Whenever I am running my program using newSingleThreadExecutor my program works very well and end to end latency(95th percentile) comes around 5ms.

But as soon as I start running my program using-

newFixedThreadPool(20)

my program performance degrades and I start seeing end to end latency as 37ms.

So now I am trying to understand from architecture point of view what does number of threads means here? And how to decide what is the optimal number of threads I should choose?

And if I am using more number of threads then what will happen?

If anyone can explain me these simple things in a layman language then that will be very useful to me. Thanks for the help.

My machine config spec- I am running my program from Linux machine-

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping        : 7
cpu MHz         : 2599.999
cache size      : 20480 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm arat pln pts
bogomips        : 5199.99
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping        : 7
cpu MHz         : 2599.999
cache size      : 20480 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm arat pln pts
bogomips        : 5199.99
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
Jatin
  • 31,116
  • 15
  • 98
  • 163
arsenal
  • 23,366
  • 85
  • 225
  • 331
  • 1
    Weren't you satisfied with the answer here:http://stackoverflow.com/questions/16125626/performance-issues-with-newfixedthreadpool-vs-newsinglethreadexecutor ? – AllTooSir Apr 21 '13 at 06:06
  • I wanted to understand more in detail. That question I posted was more about programming and finding the bottleneck but Gray suggested it might be issue with the threads size. So I thought let's post another question but this time more specific to architecture point of view. – arsenal Apr 21 '13 at 06:10

3 Answers3

55

Ok. Ideally assuming your threads do not have locking such that they do not block each other (independent of each other) and you can assume that the work load (processing) is same, then it turns out that, have a pool size of Runtime.getRuntime().availableProcessors() or availableProcessors() + 1 gives the best results.

But say, if threads interfere with each other or have I/O inlvolved, then Amadhal's law explains pretty well. From wiki,

Amdahl's law states that if P is the proportion of a program that can be made parallel (i.e., benefit from parallelization), and (1 − P) is the proportion that cannot be parallelized (remains serial), then the maximum speedup that can be achieved by using N processors is

Amadhal law

In your case, based upon the number of cores available, and what work they precisely do (pure computation? I/O? hold locks? blocked for some resource? etc..), you need to come up with the solution based upon above parameters.

For example: Some months back I was involved with collecting data from numeral web-sites. My machine was 4-core and I had a pool size of 4. But because the operation was purely I/O and my net speed was decent, I realized that I had best performance with a pool size of 7. And that is because, the threads were not fighting for computational power, but for I/O. So I could leverage the fact that more threads can contest for core positively.

PS: I suggest, going through the chapter Performance from the book - Java Concurrency in Practice by Brian Goetz. It deals with such matters in detail.

Jatin
  • 31,116
  • 15
  • 98
  • 163
  • Thanks Jatin for the suggestion. Regarding the core size, I also posted the config spec of my machine in linux, can you figure out what is the core size I have in that? As I am not able to figure that one out by looking at the config spec. – arsenal Apr 21 '13 at 06:21
  • 1
    @TechGeeky calling `Runtime.getRuntime().availableProcessors()` will return you the number of cores – Jatin Apr 21 '13 at 06:22
  • Yeah, It looks like I only have 2 cores in my load and performance machine. And I was running my program with 20 threads so that is the reason I am seeing so much high performance issues in my program. Right? – arsenal Apr 21 '13 at 06:27
  • @TechGeeky Absolutely. `20` is actually a kill :P. I would suggest, setting the size based upon Amadhal law and play around with it by increasing or decreasing the pool size around that value. – Jatin Apr 21 '13 at 06:29
  • Yeah sure will play around with that. Just one more thing I wanted to clear is when you say pool size. You are talking about `threads` in this code `newFixedThreadPool(threads)`. Right? I should be using `3` there but again I can play around with that numbers and see how it is behaving – arsenal Apr 21 '13 at 06:34
  • Yes, that is what I mean - the `int` in `newFixedThreadPool(someInt)`. – Jatin Apr 21 '13 at 06:37
  • If time is available, you can even change this `int` value on the run but application it self will have to detect and compute the count and keep changing it. – Jatin Apr 21 '13 at 06:38
  • I know this question has been closed as it has been accepted by me. I was reading an article related to my computers/cores/threads. This is the [article](http://www.cpu-world.com/news_2011/2011040302_Xeon_E7_microprocessors_to_launch_next_week.html) I am talking about. In that article they have various Model but they have cores and threads as well. What is the difference between CORES and THREADS now? So far whatever you have told me it is related to `CORES`, now I saw one more thing. So just wanted to understand what does this means? Thanks for the help. – arsenal Apr 21 '13 at 21:14
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/28618/discussion-between-techgeeky-and-jatin) – arsenal Apr 21 '13 at 21:14
  • Completely agree with @Jatin that thread pool size of available processors will give the best result. – Sumanth Varada Jan 08 '19 at 10:26
7

So now I am trying to understand from architecture point of view what does number of threads means here?

Each thread has its own stack memory, program counter (like a pointer to what instruction executes next) and other local resources. Swapping them out hurts latency for a single task. The benefit is that while one thread is idle (usually when waiting for i/o) another thread can get work done. Also if there are multiple processors available, they can run in parallel if there is no resource and/or locking contention between the tasks.

And how to decide what is the optimal number of threads I should choose?

The trade-off between swap-price versus the opportunity to avoid idle time depends on the little details of what your task looks like (how much i/o, and when, with how much work between i/o, using how much memory to complete). Experimentation is always the key.

And if I am using more number of threads then what will happen?

There will usually be linear-ish growth in throughput at first, then a relative flat part, then a drop (which may be quite steep). Each system is different.

Mel Nicholson
  • 3,225
  • 14
  • 24
5

Looking at Amdahl’s law is fine, especially if you know exactly how big P and N are. Since this will never really happen, you could monitor the performance (which you should do anyway) and increase/decrease you thread pool size to optimize whatever performance metrics are important to you.

Ralf H
  • 1,392
  • 1
  • 9
  • 17