Stream.reduce always preserving order on parallel, unordered stream

Question

I've gone through several previous questions like Encounter order preservation in java stream, this answer by Brian Goetz, as well as the javadoc for Stream.reduce(), and the java.util.stream package javadoc, and yet I still can't grasp the following:

Take this piece of code:

  public static void main(String... args) {
    final String[] alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".split("");
    System.out.println("Alphabet: ".concat(Arrays.toString(alphabet)));
    System.out.println(new HashSet<>(Arrays.asList(alphabet))
          .parallelStream()
          .unordered()
          .peek(System.out::println)
          .reduce("", (a,b) -> a + b, (a,b) -> a + b));
  }

Why is the reduction always* preserving the encounter order?

So far, after several dozen runs, output is the same

To add to Joe C answer, parallel stream expects from reduce (or any folding) that associativity (a+b)+c=a+(b+c) will work, but does not require commutativity. Although + on numbers is commutative, + in terms of string concatenation is not: "a"+"b" is not "b"+"a". All parallel stream algorithms assume folding (reducing) is not commutative, and guarantee to preserve left-right side when combining. It is better to write `.reduce("", (acc,x) -> acc + x, (a,b) -> a + b)` showing that you add x to accumulator and both lambdas are not necessarily the same. — Alex Pakka, Aug 04 '17 at 22:21
"All parallel stream algorithms assume folding (reducing) is not commutative, and guarantee to preserve left-right side when combining." - where is this documented? — George Aristy, Aug 04 '17 at 22:27
`unordered()` says the order is _undefined,_ not that it's guaranteed to be different from the original order or guaranteed to change. It can be whatever the implementation chooses. — Louis Wasserman, Aug 04 '17 at 22:30
Louis, that is true. The documentation includes heavy use of the work "may". Does that mean that the implementation I'm using (openjdk version "1.8.0_131") just happens to always preserve encounter order? — George Aristy, Aug 04 '17 at 22:40

score 5 · Accepted Answer · answered Aug 04 '17 at 22:37

5

First of all unordered does not imply an actual shuffling; all it does it sets a flag for the Stream pipeline - that could later be leveraged.

A shuffle of the source elements could potentially be much more expensive then the operations on the stream pipeline themselves, so the implementation might choose not to do this(like in this case).

At the moment (tested and looked at the sources) of jdk-8 and jdk-9 - reduce does not take that into account. Notice that this could very well change in a future build or release.

Also when you say unordered - you actually mean that you don't care about that order and the stream returning the same result is not a violation of that rule.

For example notice this question/answer that explains that findFirst for example (just another terminal operation) changed to take unordered into consideration in java-9 as opposed to java-8.

answered Aug 04 '17 at 22:37

Eugene

117,005
15
201
306

1

Thank you. I think my confusion stemmed from an expectation that the shuffling was guaranteed. – George Aristy Aug 05 '17 at 13:14
The questions is about a parallel stream. And I would also expect that reduce will especially if stream is marked as unordered take whatever elements, multiple elements at once and then start to merge it in an random first come first served way. We do not mean here intentional shuffling of elements. – RenatoIvancic Apr 17 '20 at 14:47
@RenatoIvancic or, `unordered` does nothing, and simply does the same thing - as we do not care about order. what is your actual question/clarification needed here? – Eugene Apr 17 '20 at 14:50
I don't understand your definition of shuffling. I would expect .reduce() method using a parallel stream would spawn multiple threads and start to concatenate strings with accumulator and combiner. As order can not be guaranteed in proper parallel execution I am still wondering how that everything is sorted. Or it there some step at the end in .reduce() itself that orders elements after parallel execution. I'll have to make some tests and check the source code. – RenatoIvancic Apr 17 '20 at 18:25
@RenatoIvancic I see, let me try to explain. _unless_ otherwise specified, every terminal operation (`reduce` included) will preserve the encounter order from the initial source (think `List` _not_ a `Set`), unless you break it (`list.stream().unordered()`). At the moment, calling `unordered` for `reduce` is not implemented in any form. Why? Because of how the implementation is made internally, the only way to achieve this would be via shuffling the initial source, on purpose. Such a route is not taken. – Eugene Apr 17 '20 at 18:38
Now after a bit of digging, I came to my own conclusion of how `.reduce()` operation applies encounter ordering. It checks if spliterator (implementation made internally) is ordered and it takes into consideration only this flag. Tthe example of @George Aristy is not ordered as he uses HashSet. Its just a coincidence that he uses this unfortunate input which is the same as unordered result. So `.reduce()` actually is in your words "shuffling" the result if you use any other fitting input instead of "ABCDEFG..." – RenatoIvancic Apr 20 '20 at 19:25

score 3 · Answer 2 · answered Aug 04 '17 at 21:34

3

To help explain this, I am going to reduce the scope of this string to ABCD.

The parallel stream will divide the string into two pieces: AB and CD. When we go to combine these later, the result of the AB side will be the first argument passed into the function, while the result of the CD side will be the second argument passed into the function. This is regardless of which of the two actually finishes first.

The unordered operator will affect some operations on a stream, such as a limit operation, it does not affect a simple reduce.

answered Aug 04 '17 at 21:34

Joe C

15,324
8
38
50

1

As an aside, I should note that the stream returned by a `HashMap` is already unordered. – Joe C Aug 04 '17 at 21:41
[joe C](https://stackoverflow.com/users/6815131/joe-c), after the stream is split and its parts processed in parallel, how is the order preserved afterwards? – George Aristy Aug 04 '17 at 22:31
2

@JoeC well.. this is a bit off, *at the moment* `unordered` is not take into consideration for `reduce` - it could easily change... – Eugene Aug 04 '17 at 22:38
1

@GeorgeAristy that's a deep implementation detail - but *lots* of terminal operations are carefully made to preserve the order - that is *source encounter order* – Eugene Aug 04 '17 at 22:40

RenatoIvancic · Answer 3 · 2020-04-20T19:32:10.413

TLDR: .reduce() is not always preserving order, its result is based on the stream spliterator characteristics.

Spliterator

The encounter order of the stream depends on stream spliterator (None of the answers mentioned that before).

There are different spliterators based on the source stream. You can get the types of spliterators from the source code of those collections.

HashSet -> HashMap#KeySpliterator = Not ordered

ArrayDeque = Ordered

ArrayList = Ordered

TreeSet -> TreeMap#Spliterator = Ordered and sorted

logicbig.com - Ordering logicbig.com - Stateful vs Stateless

Additionally you can apply .unordered() intermediate stream operation that specifies following operations in the stream should not rely on ordering.

Stream operations (mostly stateful) that are affected by spliterator and usage of .unordered() method are:

.findFirst()
.limit()
.skip()
.distinct()

Those operations will give us different results based on the order property of the stream and its spliterator.

.peek() method does not take ordering into consideration, if stream is executed in parallel it will always print/receive elements in unordered manner.

.reduce()

Now for the terminal .reduce() method. Intermediate operation .unordered() doesn't have any affect on type of spliterator (as @Eugene mentioned). But important notice, it still stays the same as it is in the source spliterator. If source spliterator is ordered, result of the .reduce() will be ordered, if source was unordered result of .reduce() will be unordered.

You are using new HashSet<>(Arrays.asList(alphabet)) to get the instance of the stream. Its spliterator is unordered. It was just a coincidence that you are getting your result ordered because you are using the single alphabet Strings as elements of the stream and unordered result is actually the same. Now if you would mix that with numbers or mix it with lower case and upper case then this doesn't hold true anymore. For example take following inputs, the first one is subset of the example you posted:

HashSet .reduce() - Unordered

"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "a1Ab2Bc3C"
"Apple","Orange","Banana","Mango" -> "AppleMangoOrangeBanana"

TreeSet .reduce() - Ordered, Sorted

"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "123ABCabc"
"Apple","Orange","Banana","Mango" -> "AppleBananaMangoOrange"

ArrayList .reduce() - Ordered

"A","B","C","D","E","F" -> "ABCDEF"
"a","b","c","1","2","3","A","B","C" -> "abc123ABC"
"Apple","Orange","Banana","Mango" -> "AppleOrangeBananaMango"

You see that testing .reduce() operation only with an alphabet source stream can lead to false conclusions.

The answer is .reduce() is not always preserving order, its result is based on the stream spliterator characteristics.

This should be the answer accepted. Thanks. – rocotocloc Oct 13 '21 at 09:52 — rocotocloc, Oct 13 '21 at 09:52

Stream.reduce always preserving order on parallel, unordered stream

3 Answers3