21

As I see it, the obvious code, when using Java 8 Streams, whether they be "object" streams or primitive streams (that is, IntStream and friends) would be to just use:

someStreamableResource.stream().whatever()

But then, quite a few "streamable resources" also have .parallelStream().

What isn't clear when reading the javadoc is whether .stream() streams are always sequential, and whether .parallelStream() streams are always parallel...

And then there is Spliterator, and in particular its .characteristics(), one of them being that it can be CONCURRENT, or even IMMUTABLE.

My gut feeling is that in fact, whether a Stream can be, or not, parallel by default, or parallel at all, is guided by its underlying Spliterator...

Am I on the right track? I have read, and read again, the javadocs, and still cannot come up with a clear answer to this question...

Jeffrey Bosboom
  • 13,313
  • 16
  • 79
  • 92
fge
  • 119,121
  • 33
  • 254
  • 329

5 Answers5

15

First, through the lens of specification. Whether a stream is parallel or sequential is part of a stream's state. Stream-creation methods should specify whether they create a sequential or parallel stream (and most in the JDK do), but they are not required to say so. If your stream source doesn't say, don't assume. If someone passes you a stream, don't assume.

Parallel streams are allowed to fall back to sequential at their discretion (since a sequential implementation is a parallel implementation, just a potentially imperfect one); the opposite is not true.

Now, through the lens of implementation. In the stream-creation methods in Collections and other JDK classes, we stick to a discipline of "create a sequential stream unless the user explicitly asks for parallelism". (Other libraries, however, make different choices. If they're polite, they'll specify their behavior.)

The relationship between stream parallelism and Spliterator only goes in one direction. A Spliterator can refuse to split -- effectively denying any parallelism -- but it can't demand that a client split it. So an uncooperative Spliterator can undermine parallelism, but not determine it.

Brian Goetz
  • 90,105
  • 23
  • 150
  • 161
  • "A Spliterator can refuse to split -- effectively denying any parallelism" - the Spliterator documentation is not very clear on that point: "Operations using a Spliterator that cannot split... are **unlikely** to benefit from parallelism." Is there a Stream implementation that can parallelize itself without `trySplit`? Is there a more definitive way to force a stream to remain sequential? – shmosel Jun 01 '15 at 07:14
  • 1
    @shmosel Calling `.sequential()` requires the stream to go sequential. But parallelism can only be requested, not required, as there are many factors that all need to come together to achieve actual parallelism. – Brian Goetz Jun 01 '15 at 13:18
  • I'm asking as the stream creator. Is there a way to create a Spliterator or a Stream that cannot be parallelized? I would think that could be accomplished by not implementing `trySplit`, but the documentation is not clear on that point. – shmosel Jun 01 '15 at 16:46
  • @shmosel Usually the "stream creator" (I assume you mean client?) is not the implementor of the spliterator. If you want a sequential stream, call .sequential() on the stream. – Brian Goetz Jun 01 '15 at 17:08
  • No, I mean the implementor. – shmosel Jun 01 '15 at 17:10
  • @shmosel in that case, just make `.trySplit()` return `null` – fge Aug 24 '17 at 20:25
  • @fge The documentation isn't clear on whether that's enough to prevent parallelization. See my initial comment. – shmosel Aug 24 '17 at 21:11
2

The API doesn't have much to say on the matter:

Streams are created with an initial choice of sequential or parallel execution. (For example, Collection.stream() creates a sequential stream, and Collection.parallelStream() creates a parallel one.)

Regarding your line of reasoning that some intermediate operations may not be thread safe, you may want to read the package summary. The package summary discusses intermediate operations, stateful vs stateless, and how to properly use a Stream in some depth.

Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.

Behavioral parameters being the arguments given to stateless intermediate operations.

the API cannot make any assumptions

The API can make any assumption it wishes. The onus is on the user of the API to meet those assumptions. However, assumptions may limit usability. The Stream API discourages the creation of a stateless intermediate operation that is not thread-safe. Since it is discouraged instead of prohibited, most Streams will be sequential "by default".

Jeffrey
  • 44,417
  • 8
  • 90
  • 141
  • Thank you for giving me a new light on what I have already read but obviously was too obtuse to spot... – fge Jan 14 '15 at 03:56
1

Well, answer to self...

After thinking about it a little more seriously (go figure, such things only happen after I actually ask the question), I actually came up with a reason why...

Intermediate operations may NOT be thread safe; as such, the API cannot make any assumptions, hence if the user wants a parallel stream, it has to explicitly ask for it and ensure that all intermediate operations used in the stream are thread safe.

There is however the somewhat misleading case of Collectors; since a Collector cannot know by advance whether it will be called as a terminal operation on a stream which is parallel or not, the contract makes it clear that "just to be safe", any Collector must be thread safe.

fge
  • 119,121
  • 33
  • 254
  • 329
  • 2
    Actually, [the documentation of `Collector`](http://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html) doesn’t mandate that a `Collector` must be thread safe. It’s the responsibility of the `Stream` implementation to use the `Collector` correctly to ensure thread safety. On the other hand, it’s required that “the collector functions must satisfy an identity and an associativity constraints” which is required to make the overall operation thread-safe… – Holger Jan 14 '15 at 10:22
1

It is mentioned here: "When you create a stream, it is always a serial stream unless otherwise specified." And here: "It is allowable for this method (parallelStream) to return a sequential stream."

CONCURRENT and IMMUTABLE aren't (directly) related to this. They specify whether the underlying collection can be modified without rendering the spliterator invalid or whether it is immutable respectively. The feature of spliterator that does pretty much define the behavior of parallelStream is trySplit. Terminal operations on a parallel stream will eventually invoke trySplit, and whatever that implementation does will in the end of the day define what parts, if any, of the data are processed in parallel.

Dima
  • 39,570
  • 6
  • 44
  • 70
1

This appart is not specification constrained right now, however the short answer is NO. There exist parallelStream() and stream() functions but that just provides you ways to access to a parallel or sequential implementations of common basic operations to process the stream. Currently runtime can't assume that your operations are thread safe without explicit usage of parallelStream() or parallel() call, then default implementation of stream() is to have a sequential behavior.