In the absence of special pleading, stream operations must behave as if the elements are processed in the encounter order of the source. For some operations -- such as reduction with an associative operation -- one can obey this constraint and still get efficient parallel execution. For others, though, this constraint is very limiting. And, for some problems, this constraint isn't meaningful to the user. Consider the following stream pipeline:
people.stream()
.collect(groupingBy(Person::getLastName,
mapping(Person::getFirstName));
Is it important that the list of first names associated with "Smith" appear in the map in the order they appeared in the initial stream? For some problems, yes, for some no -- we don't want the stream library guessing for us. An unordered collector says that it's OK to insert the first names into the list in an order inconsistent with the order in which Smith-surnamed people appear in the input source. By relaxing this constraint, sometimes (not always), the stream library can give a more efficient execution.
For example, if you didn't care about this order preservation, you could execute it as:
people.parallelStream()
.collect(groupingByConcurrent(Person::getLastName,
mapping(Person::getFirstName));
The concurrent collector is unordered, which permits the optimization of sharing an underlying ConcurrentMap
, rather than having O(log n)
map-merge steps. Relaxing the ordering constraint enables a real algorithmic advantage -- but we can't assume the constraint doesn't matter, we need for the user to tell us this. Using an UNORDERED
collector is one way to tell the stream library that these optimizations are fair game.