29

The Javadoc for Stream.forEach says (emphasis mine):

The behavior of this operation is explicitly nondeterministic. For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism. For any given element, the action may be performed at whatever time and in whatever thread the library chooses. If the action accesses shared state, it is responsible for providing the required synchronization.

The same text is present in the Java 9 Early Access Javadoc.

The first sentence ("explicitly nondeterministic") suggests (but doesn't explicitly say) that encounter order is not preserved by this method. But the next sentence, that explicitly says order is not preserved, is conditioned on "For parallel stream pipelines", and that condition would be unnecessary if the sentence applied regardless of parallelism. That leaves me unsure whether forEach preserves encounter order for sequential streams.

This answer points out a spot where the streams library implementation calls .sequential().forEach(downstream). That suggests forEach is intended to preserve order for sequential streams, but could also just be a bug in the library.

I've sidestepped this ambiguity in my own code by using forEachOrdered to be on the safe side, but today I discovered that NetBeans IDE's "use functional operations" editor hint will convert

for (Foo foo : collection)
    foo.bar();

into

collection.stream().forEach((foo) -> {
    foo.bar();
});

which introduces a bug if forEach does not preserve encounter order. Before I report a bug against NetBeans, I want to know what the library actually guarantees, backed up by a source.

I'm looking for an answer drawing from authoritative sources. That could be an explicit comment in the library implementation, discussion on the Java development mailing lists (Google didn't find anything for me but maybe I don't know the magic words), or a statement from the library designers (of which I know two, Brian Goetz and Stuart Marks, are active on Stack Overflow). (Please do not answer with "just use forEachOrdered instead" -- I already do, but I want to know if code that doesn't is wrong.)

Community
  • 1
  • 1
Jeffrey Bosboom
  • 13,313
  • 16
  • 79
  • 92
  • 5
    The documentation is quite clear. `forEach` ignores the "ordered" property of all streams, parallel/sequential, ordered/unordered. There is no exception. Even if there is no code in the current streams implementation that would result in non-encounter order for sequential streams, that option is left open for the future. – zapl Dec 13 '15 at 03:33
  • 6
    The question is actually deeper: whether unordered stream characteristic matters for sequential stream (or may matter in future). Currently sequential unordered stream is always processed like ordered, but many optimizations could be performed when source or terminal op is unordered. For example, `.distinct()` uses `LinkedHashSet` to preserve the order. It could use simply `HashSet` with less overhead if terminal op is `forEach` or `findAny`. – Tagir Valeev Dec 13 '15 at 04:51
  • Also, its fairly common to assume that in-order processing is necessary even in cases where it is not. This is a side-effect of the pervasive sequentiality that we have all grown up with; in-order has been so easy and so cheap for so long that its easy to not question the assumption that it is needed, even when it isn't. – Brian Goetz Dec 13 '15 at 18:35
  • I filed [NetBeans bug 257129](https://netbeans.org/bugzilla/show_bug.cgi?id=257129). – Jeffrey Bosboom Dec 13 '15 at 21:06
  • 1
    Does NetBeans really do that? Why would you call `collection.stream().forEach()` instead of simply `collection.forEach()`? – shmosel Dec 14 '17 at 08:26
  • I would argue that `forEach` at least in theory could enable some fairly aggressive opitimizations even in sequential pipelines. for example `.sorted().forEach()` could just elide the sorting. – the8472 Jan 01 '18 at 06:50

1 Answers1

20

Specifications exist to describe the minimal guarantees a caller can depend on, not to describe what the implementation does. This gap is crucial, as it allows the implementation flexibility to evolve. (Specification is declarative; implementation is imperative.) Overspecification is just as bad as underspecification.

When a specification says "does not preserve property X", it does not mean that the property X may never be observed; it means the implementation is not obligated to preserve it. Your claimed implication that encounter order is never preserved is simply a wrong conclusion. (HashSet doesn't promise that iterating its elements preserves the order they were inserted, but that doesn't mean this can't accidentally happen -- you just can't count on it.)

Similarly, your implication of "that suggests forEach is intended to preserve order for sequential streams" because you saw an implementation that does so in some case is equally incorrect.

In both cases, it seems like you're just uncomfortable with the fact that the specification gives forEach a great deal of freedom. Specifically, it has the freedom to not preserve encounter order for sequential streams, even though that's what the implementation currently does, and further that it's kind of hard to imagine an implementation going out of its way to process sequential sources out of order. But that's what the spec says, and that's what it was intended to say.

That said, the wording of the comment about parallel streams is potentially confusing, because it is still possible to misinterpret it. The intent of calling out the parallel case explicitly here was pedagogical; the spec is still perfectly clear with that sentence removed entirely. However, to a reader who is unaware of parallelism, it would be almost impossible to not assume that forEach would preserve encounter order, so this sentence was added to help clarify the motivation. But, as you point out, the desire to treat the sequential case specially is still so powerful that it would be beneficial to clarify further.

Jeffrey Bosboom
  • 13,313
  • 16
  • 79
  • 92
Brian Goetz
  • 90,105
  • 23
  • 150
  • 161
  • 2
    I'm totally on board with specifications allowing implementation freedom. I just wanted an unambiguous statement so I didn't have to argue with the NetBeans developers about their bug. (As for "never", I was thinking "regardless of whether the stream is ordered", not "must shuffle the order". But of course an unordered stream has no order to preserve anyway.) In addition to clarifying the text, maybe adding `@see forEachOrdered` would show the developer there is a choice to be made. – Jeffrey Bosboom Dec 13 '15 at 20:38
  • 1
    @JeffreyBosboom Yes, that will also help. The key issue here (and this is obviously harder than it looks, since we didn't get it 100% right the first time) is helping people get over their pervasive sequential bias. Even the notion of "has a defined encounter order" is confusing to a lot of people! – Brian Goetz Dec 13 '15 at 21:08
  • 1
    I think the intent could be clarified by removing the words "guarantee to" from the second sentence. That would more clearly convey that it's an implementation note rather than a specification detail unique to parallel streams. – shmosel Dec 14 '17 at 08:17
  • 2
    @shmosel Based on experience, I don't think that would be a clarification, I just think it would confuse a different group of people than the current version. If it said "does not respect", some people would surely assume that we go out of our way to disrespect the order, not unlike collections whose iteration order are randomized. (Writing spec for stuff that 10M people are going to use isn't so easy. This may sound absurd, but we see stuff like this all the time.) – Brian Goetz Dec 14 '17 at 14:45
  • I hear, but I still think it would be objectively more correct. Anyway, while I have you here, do the next sentences about thread-safety also apply to sequential streams? And if so, *why*? Why make the spec so loose and counterintuitive? – shmosel Dec 14 '17 at 18:54
  • 3
    @shmosel How about just writing the code that clearly captures your semantic expectations? If you care about encounter order, just use `forEachOrdered`, and now your code is clear. And `forEachOrdered` on on sequential streams is no more expensive than `forEach`. At root, I think this you are really saying "please don't make me reason about hard concepts like ordering." For a framework that supports low-ceremony parallelism, I don't think that would be doing anyone favors. – Brian Goetz Dec 14 '17 at 19:09
  • Having used streams for some time, I wouldn't mind a little more ceremony around the rare case of parallelism if it simplified reduction and removed constraints on ordering, statefulness, etc. for the far more common sequential use case. Do you still feel the tradeoff was worthwhile? – shmosel Feb 16 '22 at 21:08
  • 1
    The wording of the documentation is not “potentially confusing”, is it definitely, without any doubt, outright confusing. If you want to express “*XYZ does always apply*”, it is, of course, misleading to have the statement “*For parallel stream pipelines, XYZ applies*” and no other mentioning of XYZ at all. That’s [how the documentation still is written](https://download.java.net/java/early_access/jdk19/docs/api/java.base/java/util/stream/Stream.html#forEach(java.util.function.Consumer)). The only mentioning of “encounter order” is the statement starting with “For parallel stream pipelines”. – Holger May 04 '22 at 11:32
  • 1
    @Holger Feel free to submit a PR to improve the wording! – Brian Goetz May 04 '22 at 15:10