Note: I already addressed this problem in another SO post - Using a semaphore inside a nested Java 8 parallel stream action may DEADLOCK. Is this a bug? -, but the title of this post suggested that the problem is related to the use of a semaphore - which somewhat distracted the discussion. I am creating this one to stress that nested loops might have a performance issue - although both problems have likely a common cause (and maybe because it took me a lot of time to figure out this problem). (I don't see it as a duplicate, because it is stressing another symptom - but if you do just delete it).
Problem: If you nest two Java 8 stream.parallel().forEach loops and all tasks are independent, stateless, etc. - except for being submitted to the common FJ pool -, then nesting a parallel loop inside a parallel loop performs much poorer than nesting a sequential loop inside a parallel loop. Even worse: If the operation containing the inner loop is synchronized, your will get a DEADLOCK.
Demonstration of the performance issue
Without the 'synchronized' you can still observe a performance problem. You find a demo code for this at: http://svn.finmath.net/finmath%20experiments/trunk/src/net/finmath/experiments/concurrency/NestedParallelForEachTest.java (see the JavaDoc there for a more detailed description).
Our setup here is as follows: We have a nested stream.parallel().forEach().
- The inner loop is independent (stateless, no interference, etc. - except of the use of a common pool) and consumes 1 second in total in the worst case, namely if processed sequential.
- Half of the tasks of the outer loop consume 10 seconds prior that loop.
- Half consume 10 seconds after that loop.
- Hence every thread consumes 11 seconds (worst case) in total. * We have a boolean which allows to switch the inner loop from parallel() to sequential().
Now: submitting 24 outer-loop-tasks to a pool with parallelism 8 we would expect 24/8 * 11 = 33 seconds at best (on an 8 core or better machine).
The result is:
- With inner sequential loop: 33 seconds.
- With inner parallel loop: >80 seconds (I had 92 seconds).
Question: Can you confirm this behavior? Is this something one would expect from the framework? (I am a bit more careful now with a claim that this is a bug, but I personally believe that it is due to a bug in the implementation of ForkJoinTask. Remark: I have posted this to concurrency-interest (see http://cs.oswego.edu/pipermail/concurrency-interest/2014-May/012652.html ), but so far I did not get confirmation from there).
Demonstration of the the deadlock
The following code will DEADLOCK
// Outer loop
IntStream.range(0,numberOfTasksInOuterLoop).parallel().forEach(i -> {
doWork();
synchronized(this) {
// Inner loop
IntStream.range(0,numberOfTasksInInnerLoop).parallel().forEach(j -> {
doWork();
});
}
});
where numberOfTasksInOuterLoop = 24
, numberOfTasksInInnerLoop = 240
, outerLoopOverheadFactor = 10000
and doWork
is some stateless CPU burner.
You find a complete demo code at http://svn.finmath.net/finmath%20experiments/trunk/src/net/finmath/experiments/concurrency/NestedParallelForEachAndSynchronization.java (see the JavaDoc there for a more detailed description).
Is this behavior expected? Note that the documentation on Java parallel streams does not mention any issue with nesting or synchronization. Also, the fact that both use a common fork-join-pool is not mentioned.
Update
Another test on the performance issue can be found at http://svn.finmath.net/finmath%20experiments/trunk/src/net/finmath/experiments/concurrency/NestedParallelForEachBenchmark.java - this test come without any blocking operation (no Thread.sleep and not synchronized). I compiled some more remarks here: http://christian-fries.de/blog/files/2014-nested-java-8-parallel-foreach.html
Update 2
It appears as if this problem and the more severe DEADLOCK with semaphores has been fixed in Java8 u40.