Why does Collection.parallelStream() exist when .stream().parallel() does the same thing?

Question

In Java 8, the Collection interface was extended with two methods that return Stream<E>: stream(), which returns a sequential stream, and parallelStream(), which returns a possibly-parallel stream. Stream itself also has a parallel() method that returns an equivalent parallel stream (either mutating the current stream to be parallel or creating a new stream).

The duplication has obvious disadvantages:

It's confusing. A question asks whether calling both parallelStream().parallel() is necessary to be sure the stream is parallel, given that parallelStream() may return a sequential stream. Why does parallelStream() exist if it can't make a guarantee? The other way around is also confusing -- if parallelStream() returns a sequential stream, there's probably a reason (e.g., an inherently sequential data structure for which parallel streams are a performance trap); what should Stream.parallel() do for such a stream? (UnsupportedOperationException is not allowed by parallel()'s specification.)
Adding methods to an interface risks conflicts if an existing implementation has a similarly-named method with an incompatible return type. Adding parallelStream() in addition to stream() doubles the risk for little gain. (Note that parallelStream() was at one point just named parallel(), though I don't know if it was renamed to avoid name clashes or for another reason.)

Why does Collection.parallelStream() exist when calling Collection.stream().parallel() does the same thing?

Jeffrey Bosboom · Accepted Answer · 2014-08-18T21:30:54.773

The Javadocs for Collection.(parallelS|s)tream() and Stream itself don't answer the question, so it's off to the mailing lists for the rationale. I went through the lambda-libs-spec-observers archives and found one thread specifically about Collection.parallelStream() and another thread that touched on whether java.util.Arrays should provide parallelStream() to match (or actually, whether it should be removed). There was no once-and-for-all conclusion, so perhaps I've missed something from another list or the matter was settled in private discussion. (Perhaps Brian Goetz, one of the principals of this discussion, can fill in anything missing.)

The participants made their points well, so this answer is mostly just an organization of the relevant quotes, with a few clarifications in [brackets], presented in order of importance (as I interpret it).

parallelStream() covers a very common case

Brian Goetz in the first thread, explaining why Collections.parallelStream() is valuable enough to keep even after other parallel stream factory methods have been removed:

We do not have explicit parallel versions of each of these [stream factories]; we did originally, and to prune down the API surface area, we cut them on the theory that dropping 20+ methods from the API was worth the tradeoff of the surface yuckiness and performance cost of .intRange(...).parallel(). But we did not make that choice with Collection.

We could either remove the Collection.parallelStream(), or we could add the parallel versions of all the generators, or we could do nothing and leave it as is. I think all are justifiable on API design grounds.

I kind of like the status quo, despite its inconsistency. Instead of having 2N stream construction methods, we have N+1 -- but that extra 1 covers a huge number of cases, because it is inherited by every Collection. So I can justify to myself why having that extra 1 method is worth it, and why accepting the inconsistency of going no further is acceptable.

Do others disagree? Is N+1 [Collections.parallelStream() only] the practical choice here? Or should we go for the purity of N [rely on Stream.parallel()]? Or the convenience and consistency of 2N [parallel versions of all factories]? Or is there some even better N+3 [Collections.parallelStream() plus other special cases], for some other specially chosen cases we want to give special support to?

Brian Goetz stands by this position in the later discussion about Arrays.parallelStream():

I still really like Collection.parallelStream; it has huge discoverability advantages, and offers a pretty big return on API surface area -- one more method, but provides value in a lot of places, since Collection will be a really common case of a stream source.

parallelStream() is more performant

Brian Goetz:

Direct version [parallelStream()] is more performant, in that it requires less wrapping (to turn a stream into a parallel stream, you have to first create the sequential stream, then transfer ownership of its state into a new Stream.)

In response to Kevin Bourrillion's skepticism about whether the effect is significant, Brian again:

Depends how seriously you are counting. Doug counts individual object creations and virtual invocations on the way to a parallel operation, because until you start forking, you're on the wrong side of Amdahl's law -- this is all "serial fraction" that happens before you can fork any work, which pushes your breakeven threshold further out. So getting the setup path for parallel ops fast is valuable.

Doug Lea follows up, but hedges his position:

People dealing with parallel library support need some attitude adjustment about such things. On a soon-to-be-typical machine, every cycle you waste setting up parallelism costs you say 64 cycles. You would probably have had a different reaction if it required 64 object creations to start a parallel computation.

That said, I'm always completely supportive of forcing implementors to work harder for the sake of better APIs, so long as the APIs do not rule out efficient implementation. So if killing parallelStream is really important, we'll find some way to turn stream().parallel() into a bit-flip or somesuch.

Indeed, the later discussion about Arrays.parallelStream() takes notice of lower Stream.parallel() cost.

stream().parallel() statefulness complicates the future

At the time of the discussion, switching a stream from sequential to parallel and back could be interleaved with other stream operations. Brian Goetz, on behalf of Doug Lea, explains why sequential/parallel mode switching may complicate future development of the Java platform:

I'll take my best stab at explaining why: because it (like the stateful methods (sort, distinct, limit)) which you also don't like, move us incrementally farther from being able to express stream pipelines in terms of traditional data-parallel constructs, which further constrains our ability to to map them directly to tomorrow's computing substrate, whether that be vector processors, FPGAs, GPUs, or whatever we cook up.

Filter-map-reduce map[s] very cleanly to all sorts of parallel computing substrates; filter-parallel-map-sequential-sorted-limit-parallel-map-uniq-reduce does not.

So the whole API design here embodies many tensions between making it easy to express things the user is likely to want to express, and doing is in a manner that we can predictably make fast with transparent cost models.

This mode switching was removed after further discussion. In the current version of the library, a stream pipeline is either sequential or parallel; last call to sequential()/parallel() wins. Besides side-stepping the statefulness problem, this change also improved the performance of using parallel() to set up a parallel pipeline from a sequential stream factory.

exposing parallelStream() as a first-class citizen improves programmer perception of the library, leading them to write better code

Brian Goetz again, in response to Tim Peierls's argument that Stream.parallel() allows programmers to understand streams sequentially before going parallel:

I have a slightly different viewpoint about the value of this sequential intuition -- I view the pervasive "sequential expectation" as one if the biggest challenges of this entire effort; people are constantly bringing their incorrect sequential bias, which leads them to do stupid things like using a one-element array as a way to "trick" the "stupid" compiler into letting them capture a mutable local, or using lambdas as arguments to map that mutate state that will be used during the computation (in a non-thread-safe way), and then, when its pointed out that what they're doing, shrug it off and say "yeah, but I'm not doing it in parallel."

We've made a lot of design tradeoffs to merge sequential and parallel streams. The result, I believe, is a clean one and will add to the library's chances of still being useful in 10+ years, but I don't particularly like the idea of encouraging people to think this is a sequential library with some parallel bags nailed on the side.

I think you unearthed most of it; don't underestimate the value of discoverability. I'll just add that, since the comment on statefulness was written, the model was simplified dramatically to one where the whole pipeline is either sequential or parallel, whereas originally one could switch back and forth. This in turn reduced the cost of setting up pipelines with `sequential()` or `parallel()` calls. — Brian Goetz, Jul 07 '14 at 05:58
@BrianGoetz: The sequential/parallel change fooled me, then. I thought it still worked like it used to, though as I didn't depend on it I guess it doesn't matter. I'll edit the answer. — Jeffrey Bosboom, Jul 07 '14 at 05:59
@JeffreyBosboom The simple rule is now: last call wins, and governs the execution mode for the whole pipeline. — Brian Goetz, Jul 07 '14 at 06:00
@BrianGoetz I am glad to hear this sorted out for me; while reading the documentation I did not get this message so loud and clear. Reading the sentence "Stream pipelines may execute either sequentially or in parallel" still left some room for wrong interpretation because it is not crystal-clear what exactly is a "stream pipeline". Also, since each non-terminal operation method returns a *new* stream, it was unclear for me whether the "parallel" property held for this particular stream only, or it modified the "upstream" streams as well. — Marko Topolnik, Jul 07 '14 at 06:58
@MarkoTopolnik A stream pipeline consists of a source, zero or more intermediate operations, and a terminal operation. This is the fundamental unit of stream execution. — Brian Goetz, Jul 07 '14 at 14:00
@BrianGoetz Yes, that part is clear, but here's the subtlety that has been bothering me: in `b = a.map(fn)`, does `b` *wrap* `a`, so `a` exists within `b` as a stream in its own right, or is it like the Builder pattern, where `b` is just another stream whose properties are a modification of `a`'s properties? The shape of the API supports both views and anyone coming from Scala/Clojure is already biased towards the "wrapping" view. However, when reasoning about statements like those I quote above, these two views bring out entirely different, sometimes opposite, conclusions. — Marko Topolnik, Jul 08 '14 at 07:16
So there's two ways to understand a "stream pipeline": a stream could model just one pipeline stage, multiple streams composing into the full pipeline, or it could represent a *container* of the complete pipeline. While studying the documentation, I never found it easy to decide which way to look at it. — Marko Topolnik, Jul 08 '14 at 07:19
@MarkoTopolnik It's not an accident that the API supports both interpretations! This is to maximize flexibility for implementations. If the `Stream` methods had defaults, the default implementation would wrap (that's our strategy for evolving `Stream` to add methods later.) But the real implementation does more like your latter interpretation, which allows for more efficiency (e.g., op fusing and other fun.) You could implement `Stream` entirely with wrapping but it would be slower. — Brian Goetz, Jul 08 '14 at 14:58
@BrianGoetz OK, I understand that, on the implementation level, both ways are possible---but the key issue is the *conceptual* level: say `b = a.parallel()`, does the fact that `b` is parallel make `a` parallel as well? Or rather, does this view where `a` is a *part* of `b` (that's what wrapping is about) make sense at all? And the answer seems to be "no": `b` completely takes over `a`'s identity and whatever properties `a` had on its own is irrelevant once aggregation of `b` starts. This, however, is not something I could clearly read from the docs. — Marko Topolnik, Jul 08 '14 at 18:28
@MarkoTopolnik So, two things. 1) Streams effectively imposes a linear-typing constraint, so once you say `b = a.parallel()`, then `a` is *dead* -- its been used up and is no longer available. 2) Conceptually, you can think of a stream as a lazy generator for a data set. When you invoke an intermediate op on a stream, you perform a composition on the underlying generator with some new generator behavior, and the new stream is a lazy generator for that composed data set (and you kill any other access to the input stream, so as to avoid access conflicts.) — Brian Goetz, Jul 08 '14 at 18:34
@BrianGoetz Yes, that's about what my understanding of streams has evolved into. What I would propose is that some of this makes its way to some prominent place in the docs (package info or `Stream` javadoc) because it is quite subtle and doesn't easily lend itself to find out by trying (e.g., "was this aggregation run all in parallel, or was the first stage sequential, then the second one parallel?"). I think a single sentence would make wonders :) — Marko Topolnik, Jul 08 '14 at 18:41
@MarkoTopolnik Please suggest something on lambda-dev! Small non-normative doc changes can generally be easily integrated. — Brian Goetz, Jul 08 '14 at 19:05
@BrianGoetz After more reading of the `Stream` Javadoc I must conclude that it already covers what I was complaining about here, and that it was primarily my bias towards looking at streams as wrapping other streams which caused my misunderstanding. I can't find an easy way to improve the doc, which wouldn't sound artificial and unmotivated. — Marko Topolnik, Jul 09 '14 at 09:13
@MarkoTopolnik OK, good to know! Also, you were probably biased by the fact that it used to work that way, and then we switched, and there wasn't a big party to herald the change :) — Brian Goetz, Jul 09 '14 at 15:51
To clarify: this will not work well on GPUs because GPUs make certain HARDWARE trade-offs to achieve their high performance. If those trade-offs (vector ALUs, coallesced memory accesses, complicated memory model) aren't reflected in the programming model, you will not see performance gains. I don't know why the Stream team thinks it's ok to shove academic "parallel" mumbo-jumbo down ordinary developers' throats, but it's unappealling to have a facility that exposes some basic GPGPU concepts. — Aleksandr Dubinsky, Jul 10 '14 at 20:29
@BrianGoetz You've said previously, "My argument was that the most common excuse for writing dirty code was because of perceived performance benefits, but unfortunately most code atrocities committed in the name of performance don't even have the desired performance effect...a lot of the time...additional performance is not a business requirement, so one should not pay anything for that." How do you reconcile that with, (quote above) "We've made a lot of design tradeoffs to merge sequential and parallel" — Aleksandr Dubinsky, Jul 11 '14 at 00:03
@BrianGoetz (continuing...) and (quote above) "the whole API design here embodies many tensions between making it easy to express things...and doing...[it]...fast" and (*paraphrasing* last quote above) "people who want to just write some simple, imperative, non-multithreaded code with Streams are doing it all fn wrong"? — Aleksandr Dubinsky, Jul 11 '14 at 00:04
I thought it was the same performance, but after reading this, I understand that parallelStream() is better, right? The reason why, as a naive developer, I was using stream().parallel() was for being able to comment/uncomment quickly and benchmark parallel() vs unparallel(). If I used parallelStream, I would rather have it called streamParallel so that I won't need to (un)capitalize the s while benchmarking... — Myoch, May 05 '17 at 15:15

Why does Collection.parallelStream() exist when .stream().parallel() does the same thing?

1 Answers1

parallelStream() covers a very common case

parallelStream() is more performant

stream().parallel() statefulness complicates the future

exposing parallelStream() as a first-class citizen improves programmer perception of the library, leading them to write better code

Linked

Related