I'm using Java's fork-join framework to deal with a CPU- intensive calculation.
I've tweaked the "sequential threshold" (used to determine whether to create subtasks or do the work) a bit, but to my disappointment, going from single-threaded to 4+4 cores only about doubles the overall performance. The pool does report 8 CPUs, and when I manually set 2,3,4,.. I see gradual increases in performance, but still it tops out at about twice the single- thread throughput overall. Also, the Linux System Activity monitor hovers around 50% for that Java process.
Also very suspicious is the fact that when I start multiple Java processes, the collective throughput is more in line (almost 4 times faster than a single thread) and the System Activity monitor shows higher CPU use.
Is is possible that there is a limitation in either Java, Linux, or the fork/join framework that would disallow full CPU usage? Any suggestions or similar experiences?
NB. This is on an Intel 3770 CPU, with 4 cores and 4 hyperthreaded cores, running Oracle Java 7r13 on a Linux Mint box.