1

My computer has 4 cores and now I ran the stringList list on the 4 cores using the parallel method and called the reduce method with the value identity = "A". Normally, this list should be divided into 4 parts, and in each processor, the identity should be added to each part (reduced) and the result should be A12 A34, A56 A78, but the result was something else that is mentioned in the following code. Why did this happen?

 List<String> stringList = new ArrayList<>();
    stringList.add("1");
    stringList.add("2");
    stringList.add("3");
    stringList.add("4");
    stringList.add("5");
    stringList.add("6");
    stringList.add("7");
    stringList.add("8");

    String result = stringList.stream()
            .parallel()
            .reduce(" A", (s1, s2) -> s1 + s2);
    System.out.println("result:" + result); //result: A1 A2 A3 A4 A5 A6 A7 A8
Lino
  • 19,604
  • 6
  • 47
  • 65
AMZ
  • 305
  • 1
  • 5
  • 3
    4 cores might mean 8 hw threads so I'd consider this a valid result. In any case, your accumulator function doesn't match the requirements of `reduce()`. The JavaDoc states: "The accumulator function must be an associative function." and yours clearly isn't, i.e. there's no guarantee on how stream values will be grouped. – Thomas Jan 24 '22 at 08:14
  • 1
    “normally, this list should be divided into 4 parts”—where in the specification did you find that statement? – Holger Jan 24 '22 at 09:33
  • @Holger I thought because my computer has 4 cores, then it should be divided into 4 sections, but now I realized that each section also contains a series of threads that the list is divided between them. – AMZ Jan 24 '22 at 10:29
  • 2
    Since the implementation can’t know in advance whether the workload will be balanced, it will always split into more parts than threads, to allow faster threads to steal work from slower threads. Work-stealing may happen even if the task looks perfectly balanced, e.g. because on the first use, some of the worker threads must be started, which takes significant time, giving the already existing threads a head-start. Or some of your cores are occupied with other work for some time Generally, you can’t predict everything. Further, nothing specifies that the identity is used once per worker thread – Holger Jan 24 '22 at 10:36
  • @Holger thanks, Can you introduce me to a source so that I can get more information in this regard? – AMZ Jan 24 '22 at 10:44
  • @Thomas actually their accumulator function _is_ associative, but the “_identity_” is not actually one. – Didier L Jan 24 '22 at 10:46
  • @DidierL yes, you're right. I should rephrase to something like "the expectations are not associative" since the OP seems to assume that `A + B + C + D` are grouped as `(A + B) + (C + D)` with 2 threads (and then uses the "identity" as a group prefix) which may not be the case. – Thomas Jan 24 '22 at 10:58
  • 2
    You may read [this question](https://stackoverflow.com/q/34381805/2711488) and both answers regarding the processing. But keep in mind that this is just one side of the unspecified behavior. In principle, the implementation could replace `.reduce(" A", (s1, s2) -> s1 + s2)` with `.reduce((s1, s2) -> s1 + s2).orElse(" A")` without violating the contract. Or with `.map(s -> " A" + s).reduce((s1, s2) -> s1 + s2).orElse(" A")`, as all of those are producing an equivalent result when the reduction function and identity value fulfill their contract. Which they don’t do in this case. – Holger Jan 24 '22 at 11:18

1 Answers1

1

This is how the parallel reducing works. The identity is not guaranteed to be used exactly once like in the sequential processing as long as multiple threads are involved. There is not even guaranteed it would be used exactly n times (size of the list). The output can be even the following which would manifest with a larger list.

A0 A12 A3 A45 A6 A78 A9 A1011 ....

Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
  • 1
    Actually, it’s never specified, how often the identity value is used, not even in the sequential case. – Holger Jan 24 '22 at 09:35