4

I'm using Java's fork-join framework to deal with a CPU- intensive calculation.

I've tweaked the "sequential threshold" (used to determine whether to create subtasks or do the work) a bit, but to my disappointment, going from single-threaded to 4+4 cores only about doubles the overall performance. The pool does report 8 CPUs, and when I manually set 2,3,4,.. I see gradual increases in performance, but still it tops out at about twice the single- thread throughput overall. Also, the Linux System Activity monitor hovers around 50% for that Java process.

Also very suspicious is the fact that when I start multiple Java processes, the collective throughput is more in line (almost 4 times faster than a single thread) and the System Activity monitor shows higher CPU use.

Is is possible that there is a limitation in either Java, Linux, or the fork/join framework that would disallow full CPU usage? Any suggestions or similar experiences?

NB. This is on an Intel 3770 CPU, with 4 cores and 4 hyperthreaded cores, running Oracle Java 7r13 on a Linux Mint box.

Ray
  • 51
  • 3
  • 4
    To understand what's happening on your fork-join setup, you need to figure out where the bottleneck is. Beyond that, it's really hard for us to make specific suggestions based just on the information in your question. – NPE Mar 14 '13 at 16:12
  • Interesting situation. Regarding the paralelization speedup, this is the theoretical limit: [Amdahl's law](http://en.wikipedia.org/wiki/Amdahl's_law) – linski Mar 14 '13 at 16:12
  • Also, you can't count on hyperthreaded cores as you can on the "real" cores, e.g.: [1](http://stackoverflow.com/questions/680684/multi-cpu-multi-core-and-hyper-thread) [2](http://stackoverflow.com/questions/360307/multicore-hyperthreading-how-are-threads-distributed). But the first thing I'd do is to follow NPE's advice – linski Mar 14 '13 at 16:23
  • 1
    looks like a lot of blocking system calls that put the threads on wait, thereby lowering the CPU load reported by the kernel – Ralf H Mar 14 '13 at 16:32
  • try increasing parallelism in pool even higher, 16, 32, see what's happening. – ZhongYu Mar 14 '13 at 16:47
  • 1
    @Ray you should look around for already written performance tests for FJ. Run it on your machine and if you see 100% utilization on your computer there is a good chance it is your code. – John Vint Mar 14 '13 at 17:42
  • 1
    Post a little code. what does your compute() look like. Just a little code, not every step. – edharned Mar 14 '13 at 19:00
  • Keep in mind that it is relatively easy to saturate all available memory bandwidth using a single CPU on some applications. This may or may not be your bottleneck, but more threads rarely equates to a straightforward linear speedup. – Gian Mar 14 '13 at 21:20

1 Answers1

1

Thanks for the thoughts and answers, everyone! From your suggestions, I concluded that the problem was not the framework itself and went on to some more testing, finding that after a few minutes the cpu load dropped down to 15% !

Turns out, Random (which I use extensively) has poor performance in a multithreaded setup. The solution was to use ThreadLocalRandom.current().nextXXX() instead. I'm now up to consistent 80% usage (there are still some sequential passages left). Sweet!

Thanks again for putting me on the right track.

Ray
  • 51
  • 3