What does the Java 8 Collector UNORDERED characteristic mean?

Question

In official documentation you can read that:

UNORDERED Indicates that the collection operation does not commit to preserving the encounter order of input elements.

This is not too helpful without any examples.

My question is, what exactly does UNORDERED characteristic mean? Should I use it with reducing collectors like min or sum or is it only applicable to collection collectors?

In OpenJDK looks like reducing operations (min, sum, avg) have empty characteristics. I expected to find there at least CONCURRENT and UNORDERED.

UNORDERED just means not ordered. There is no guarantee as to what order you will get the results. — Peter Lawrey, Oct 09 '16 at 09:45
*"This is not too helpful without any examples."* - you're limiting your potential answer space by expecting code samples. Lots of specs and documentation only come in prose. — the8472, Oct 09 '16 at 11:00
Note that `Collectors.toSet()` has the `UNORDERED` characteristic, for example… — Holger, Oct 10 '16 at 15:29

Brian Goetz · Answer 1 · 2020-03-21T21:02:51.003

In the absence of special pleading, stream operations must behave as if the elements are processed in the encounter order of the source. For some operations -- such as reduction with an associative operation -- one can obey this constraint and still get efficient parallel execution. For others, though, this constraint is very limiting. And, for some problems, this constraint isn't meaningful to the user. Consider the following stream pipeline:

people.stream()
      .collect(groupingBy(Person::getLastName, 
                          mapping(Person::getFirstName));

Is it important that the list of first names associated with "Smith" appear in the map in the order they appeared in the initial stream? For some problems, yes, for some no -- we don't want the stream library guessing for us. An unordered collector says that it's OK to insert the first names into the list in an order inconsistent with the order in which Smith-surnamed people appear in the input source. By relaxing this constraint, sometimes (not always), the stream library can give a more efficient execution.

For example, if you didn't care about this order preservation, you could execute it as:

people.parallelStream()
      .collect(groupingByConcurrent(Person::getLastName, 
                                    mapping(Person::getFirstName));

The concurrent collector is unordered, which permits the optimization of sharing an underlying ConcurrentMap, rather than having O(log n) map-merge steps. Relaxing the ordering constraint enables a real algorithmic advantage -- but we can't assume the constraint doesn't matter, we need for the user to tell us this. Using an UNORDERED collector is one way to tell the stream library that these optimizations are fair game.

This doesn’t answer the question why built-in collectors like `summingInt` don’t have the `UNORDERED` characteristic… — Holger, Oct 10 '16 at 15:27
It’s misleading to say “Collectors like `summingInt` respect the encounter order” as it’s actually the Stream implementation that is forced to respect the encounter order due to the collector not reporting `UNORDERED`. So it should not be up to the Collector to decide that the possible savings in the Stream’s effort have no benefit. How can the Collector know that? Even, if there is no benefit *today*, let’s ask the other way round: what is the benefit of *not* reporting `UNORDERED` characteristic? It’s just a decision of passing `CH_NOID` or `CH_UNORDERED_ID` to the constructor… — Holger, Oct 10 '16 at 16:07

Marko Topolnik · Accepted Answer · 2016-10-12T08:44:39.050

8

UNORDERED essentially means that the collector is both associative (required by the spec) and commutative (not required).

Associativity allows splitting the computation into subparts and then combining them into the full result, but requires the combining step to be strictly ordered. Examine this snippet from the docs:

 A a2 = supplier.get();
 accumulator.accept(a2, t1);
 A a3 = supplier.get();
 accumulator.accept(a3, t2);
 R r2 = finisher.apply(combiner.apply(a2, a3));  // result with splitting

In the last step, combiner.apply(a2, a3), the arguments must appear in exactly this order, which means that the entire computation pipeline must track the order and respect it in the end.

Another way of saying this is that the tree we get from recursive splitting must be ordered.

On the other hand, if the combining operation is commutative, we can combine any subpart with any other, in no particular order, and always obtain the same result. Clearly this leads to many optimization opportunities in both space and time dimensions.

It should be noted that there are UNORDERED collectors in the JDK which don't guarantee commutativity. The main category are the "higher-order" collectors which are composed with other downstream collectors, but they don't enforce the UNORDERED property on them.

edited Oct 12 '16 at 08:44

answered Oct 09 '16 at 10:19

Marko Topolnik

195,646
29
319
436

I would be careful in saying that `UNORDERED` implies being commutative. It doesn’t require that swapped order yields the same result, it may imply that the difference doesn’t matter. – Holger Oct 10 '16 at 15:32
I was awaiting you, Holger ☺ In my view, "the difference doesn't matter" is the defining property of "same", for all practical purposes. – Marko Topolnik Oct 10 '16 at 17:21
I accept that point of view, but I’m not sure whether the reader understands that `Collectors.toList()` is commutative when being used within a `groupingByConcurrent` but not when being used in a `groupingBy`, as normally, the commutative property of a function is invariant. – Holger Oct 11 '16 at 13:45
I'm not sure what you meant here: `toList` is not an `UNORDERED` collector so by the above definition it doesn't claim to be commutative. Can you explain? – Marko Topolnik Oct 12 '16 at 05:49
`groupingByConcurrent(function)` is equivalent to `groupingByConcurrent(function, toList())`, which isn’t commutative. But it reports an `UNORDERED` characteristic, ignoring the fact that `toList()` is not `UNORDERED`. So `toList()` still isn’t commutative but by using it in the context of `groupingByConcurrent`, you’ll implicitly state that you accept the unreliable order of the resulting lists. – Holger Oct 12 '16 at 08:25
In terms of invariants this still works: `toList` keeps not declaring itself as commutative (I think that's more precise than "declaring non-commutative") and it is up to the user to decide that for his purpose `toList` can be accepted as commutative. On the other hand, `groupingByConcurrent` might be accused of "lying" to us because it doesn't enforce commutativity of its child collector. – Marko Topolnik Oct 12 '16 at 08:32
As said, I accept your point of view, but considering that the average Java developer isn’t familiar with functional programming terms anyway, I would avoid the term “commutative” here, as it’s much easier to understand, to say that `UNORDERED` means either, order independence or that the possibly different results are considered equivalent by the user. – Holger Oct 12 '16 at 08:40
The main point I wanted to make in this answer is the connection between commutativity and `UNORDERED`. I guess you can't satisfy everyone in every answer. I added a warning at the end. – Marko Topolnik Oct 12 '16 at 08:42

the8472 · Answer 3 · 2016-10-09T11:18:33.107

The inner Collector.Characteristics class itself is fairly terse in its description, but if you spend a few seconds exploring the context you will notice that the containing Collector interface provides additional information

For collectors that do not have the UNORDERED characteristic, two accumulated results a1 and a2 are equivalent if finisher.apply(a1).equals(finisher.apply(a2)). For unordered collectors, equivalence is relaxed to allow for non-equality related to differences in order. (For example, an unordered collector that accumulated elements to a List would consider two lists equivalent if they contained the same elements, ignoring order.)

In OpenJDK looks like reducing operations (min, sum, avg) have empty characteristics, I expected to find there at least CONCURRENT and UNORDERED.

At least for doubles summation and averages definitely are ordered and not concurrent because the summation logic uses subresult merging, not a thread-safe accumulator.

You mean that there may be e.g. overflow when order is not preserved during summation - ok I get it now — csharpfolk, Oct 09 '16 at 12:21
finite-precision floating point arithmetic is non-commutative in general, not just due to saturation to infinity. — the8472, Oct 09 '16 at 13:23

What does the Java 8 Collector UNORDERED characteristic mean?

3 Answers3

Linked