27

Here's my understanding of the Stream framework of Java 8:

  1. Something creates a source Stream
  2. The implementation is responsible for providing a BaseStream#parallel() method, which in turns returns a Stream that can run it's operations in parallel.

While someone has already found a way to use a custom thread pool with Stream framework's parallel executions, I cannot for the life of me find any mention in the Java 8 API that the default Java 8 parallel Stream implementations would use ForkJoinPool#commonPool(). (Collection#parallelStream(), the methods in StreamSupport class, and others possible sources of parallel-enabled streams in the API that I don't know about).

Only tidbits that I could gleam off search results were these:


So my question is:

Where is it said that the ForkJoinPool#commonPool() is used for parallel operations on streams that are obtained from the Java 8 API?

Community
  • 1
  • 1
Gima
  • 1,892
  • 19
  • 23
  • The very last paragraph of [here](http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html) seems to state it ("Another implementation of the fork/join framework is used by methods in the java.util.streams package, which is part of Project Lambda scheduled for the Java SE 8 release."), but it isn't quite satisfactory to me... I would *guess* that implementation details like that might not have been included to allow for future evolution, but considering that implementation details are included in so many other places it doesn't make much sense... – awksp Jul 08 '14 at 10:24
  • There's another hint [here](http://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html) ("With aggregate operations, the Java runtime performs this partitioning and combining of solutions for you."), but again, it's not quite as explicit as you might want... – awksp Jul 08 '14 at 10:28
  • Here, just dig the sources: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/stream/AbstractTask.java#AbstractTask – Konstantin V. Salikhov Jul 08 '14 at 10:30
  • 1
    It may not be stated in the API, for the reason mentioned in the other comment: It's an implementation detail. The most official resource I found (apart from the code - that's cheating ;-)) was http://jsr166-concurrency.10961.n7.nabble.com/New-default-for-ForkJoinPool-commonPool-on-systems-with-SecurityManagers-td10447.html , where **Doug Lea** stated that "*The ForkJoinPool common pool is used in JDK8 for all parallel Stream operations, parallel sorting, etc.*" ... – Marco13 Jul 08 '14 at 10:50
  • @Marco13 Some people claimed that it is an implementation detail and I would have hoped that it is an implementation detail. But when s.th. got wrong I was told that I should have known that the implementation was not compatible with a Semaphore and a ManagedBlocker should have been used: http://stackoverflow.com/questions/23442183/using-a-semaphore-inside-a-nested-java-8-parallel-stream-action-may-deadlock-is - clearly such an implementation detail needs to be documented. – Christian Fries Jul 08 '14 at 10:56
  • Sure, there seem to be some issues (there's a large rant about the shortcomings of the Java 8 parallelism at http://coopsoft.com/ar/Calamity2Article.html ). These issues could be summarized as the Amobea Effect (http://wiki.apidesign.org/wiki/Amoeba)... – Marco13 Jul 08 '14 at 13:05
  • While there are hints, they are not authoritative. This should be documented directly, not the least for the fact that other implementations of Java SE API could encounter a totally different handling of parallel streams. – Gima Jul 08 '14 at 13:44
  • @Marco13 Thanks for that link. Actually I am not so much worried about the short comings. I even have a fix for the bug which I referenced. What really worried me was the reaction to such a discussion. (I even got serial down votes). – Christian Fries Jul 09 '14 at 08:01
  • 3
    I would still consider it to be an implementation detail. [_Here_](http://stackoverflow.com/questions/22129471/22144111#comment33619394_22144111), Stuart Marks warns about treating too much implementation details for granted. – Holger Jul 28 '14 at 12:15
  • 2
    There is a funny example in [Spliterator](https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html) documentation, where they calculate batch size based on `ForkJoinPool.getCommonPoolParallelism()`. No other mention of fork/join though. – Lukas Nov 18 '14 at 20:16

3 Answers3

14

W.r.t. where is it documented that Java 8 parallel streams use FJ Framework?

Afaik (Java 1.8u5) it is not mentioned in the JavaDoc of parallel streams that a common ForkJoinPool is used.

But it is mentioned in the ForkJoin documentation at the bottom of http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html

W.r.t. replacing the Thread pool

My understanding is that you can use a custom ForkJoinPool (instead of the common one) - see Custom thread pool in Java 8 parallel stream -, but not a custom ThreadPool which is different from the ForkJoin implementation (I have an open question here: How to (globally) replace the common thread pool backend of Java parallel streams? )

W.r.t. replacing the Streams api

You may checkout https://github.com/nurkiewicz/LazySeq which is a more Scala like streams implementation - very nice, very interesting

PS (w.r.t. ForkJoin and Streams)

If you are interested, I would like to note that I stumbled across some issues with the use of the FJ pool, see, e.g.

Community
  • 1
  • 1
Christian Fries
  • 16,175
  • 10
  • 56
  • 67
  • 1
    The linked Fork/Join tutorial indeed states that some implementation of Fork/Join is used, but no mention of #commonPool(). Sounds like a time for JodaStreams.. – Gima Jul 08 '14 at 13:35
5

For what it's worth, Java 8 in Action has a chapter on Parallel data processing and performance (Chapter 7). It says:

"...the Stream interface gives you the opportunity to execute operations in parallel on a collection of data without much effort."

"...you’ll see how Java can make this magic happen or, more practically, how parallel streams work under the hood by employing the fork/join framework introduced in Java 7."

It also has a small side note in section 7.1:

"Parallel streams internally use the default ForkJoinPool...which by default has as many threads as you have processors, as returned by Runtime.getRuntime().availableProcessors()."

"you can change the size of this pool using the system property java.util .concurrent.ForkJoinPool.common.parallelism, as in the following example:"

System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism","12");

As mentioned in the comments and other answers, this does not mean it will always use the fork/join.

dustin.schultz
  • 13,076
  • 7
  • 54
  • 63
  • 1
    That's fine, but "Java 8 in Action" book is not an official documentation. – Tagir Valeev Aug 06 '15 at 04:04
  • 1
    Hence me saying, "for what it's worth". It is nonetheless a highly rated book and if it was wrong reviewers would say so. – dustin.schultz Aug 06 '15 at 04:06
  • 1
    The question is whether it's *specified* or not. If it's not specified, then it can be implemented in different way by different JDK vendors or may change in future versions of OpenJDK. If it's specified, it will stay the same forever. I cannot imagine that JDK authors stop changing the internal implementation just because this would invalidate the statement in some highly rated book. Stuart Marks [says](http://stackoverflow.com/questions/22129471/22144111#comment33619394_22144111) it's not specified. – Tagir Valeev Aug 06 '15 at 04:26
  • 1
    What?! JDK developers don't work with book authors to make sure their books are correct forever?! Lol. Yes, I understand it can change. Again, "for what it's worth". – dustin.schultz Aug 06 '15 at 04:46
1

You can check source code of terminal operations on GrepCode. For example, lets take a look at ForEachOp. As you can see evaluateParallel method of ForEachOp creates and invokes ForEachTask object which is derived from CountedCompleter derived from ForkJoinTask.

mkrakhin
  • 3,386
  • 1
  • 21
  • 34