0

I'm struggling understanding the rule mentioned in Java docs (https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html) related to parallel stream stating that "The Stream.collect(Collector) implementation will only perform a concurrent reduction if: 1..2.. 3.Either the stream is unordered, or the collector has the Collector.Characteristics.UNORDERED characteristic." we have four cases:

  1. both are ordered which clearly will **not **take the advantage of concurrent reduction.
  2. both are unordered which clearly will take the advantage of concurrent reduction.
  3. stream is unordered and collector is ordered which also clearly will take the advantage of concurrent reduction
  4. stream is ordered and the collector is unordered which meets the third condition and supposed to take the advantage of concurrent reduction. however, what I understand is that the main responsibility to order the output of parallel threads to keep the stream ordered, is on the stream itself (the responsibility is on the stream to reorder the outputs of parallel stream threads) which means (in my understanding) that the performance will be affected by this step regardless the characteristic of the collector (ordered or nonordered). the stream will take the responsibility to reorder the outputs and reduce the performance, then, the collector will store the inputs come from the stream as received with no impact on the performance.

So, the third conditions or rule should be "both of them are unordered" or "stream is unordered". I know there is something wrong in my logic but I couldn't find any clarifications on the internet explaining this point. could any one, with thanks, explain what is wrong in my understanding?

I searched the internet for the answer, I tried asking chatGPT (it is very useful in studying btw) however, didn't get a satisfying answer.

  • I do not fully understand your reasoning. The part of the doc you cited is only concerned about the collect-operation. We can imagine the `stream` as a queue, and when reducing, each thread takes the next element from the queue if it has nothing to do. I do not understand why it should impact performance of the reduction. – Turing85 Jan 15 '23 at 21:38
  • Thanks for your reply. I got the missed point in my understanding by the answers provided hereunder by shmosel and Thomas Kläger. thanks again – Abdelhamid Marey Jan 15 '23 at 22:37

2 Answers2

1

The collector is not an isolated part of the stream pipeline. The stream knows about the collector and can alter its behavior appropriately. Think of forEach() and forEachOrdered(). When called on a parallel stream (well, technically any stream), forEach() makes no ordering guarantees, despite the orderedness of the stream itself. Conversely, forEachOrdered() can't make any ordering guarantees if the stream itself has no encounter order. Only when using an ordered terminal operation combined with an ordered stream do the elements have to be processed sequentially. The same goes for collectors.

shmosel
  • 49,289
  • 6
  • 73
  • 138
  • Many thanks for your answer. I think I got the missed point in my logic from your answer and the answer provided by @Thomas Kläger please confirm if I understand it correctly. the stream is pre-evaluating the condition and as the order is not preserved (by using unordered collection), so it will not waste the time ordering the outputs from the threads even if the original source stream was ordered and it can then use concurrent reduction – Abdelhamid Marey Jan 15 '23 at 22:19
  • Pretty much. It's important to remember that the stream pipeline doesn't actually do anything until the terminal operation sets it into motion. So it's not like the stream and the collector are separate entities collaborating. It's a single terminal operation that takes into account characteristics of the stream and of the collector when plotting its approach. – shmosel Jan 15 '23 at 22:22
  • Many many thanks. you really saved my day. I'm trying to upvote the answer however it seams I'm not allowed until I got enough reputation points. Thanks again – Abdelhamid Marey Jan 15 '23 at 22:27
0

the collector has the Collector.Characteristics.UNORDERED characteristic:

The description for this characteristic is

Indicates that the collection operation does not commit to preserving the encounter order of input elements. (This might be true if the result container has no intrinsic order, such as a Set.)

If the result of the collect operation has no intrinsic order the Stream.collect() operation doesn't need to preserve any order present in the stream (since this order is lost in the result anyway) and it can therefore use a concurrent reduction (which will probably loose that order).

Thomas Kläger
  • 17,754
  • 3
  • 23
  • 34
  • Many thanks for your answer. I think I got the missed point in my logic from your answer and the answer provided by @shmosel please confirm if I understand it correctly. the stream is pre-evaluating the condition and as the order is not preserved (by using unordered collection), so it will not waste the time ordering the outputs from the threads even if the original source stream was ordered and it can then use concurrent reduction. – Abdelhamid Marey Jan 15 '23 at 22:17