3

Can someone tell me why this is happening and if it's expected behaviour or a bug

List<Integer> a = Arrays.asList(1,1,3,3);

a.parallelStream().filter(Objects::nonNull)
        .filter(value -> value > 2)
        .reduce(1,Integer::sum)

Answer: 10

But if we use stream instead of parallelStream I'm getting the right & expected answer 7

  • check [documentation](https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/util/stream/IntStream.html#reduce(int,java.util.function.IntBinaryOperator)) of `reduce`: *"...The identity value must be an identity for the accumulator function. This means that for all `x`, `accumulator.apply(identity, x)` is equal to `x`..."* - not the case for `1` and `Integer::sum` –  Mar 01 '21 at 19:51
  • 1
    to be able to use parallel streams, the reducing is done in parts, each part is adding one (probably once for each element) - try it with `0` and the result should be correct –  Mar 01 '21 at 20:01

1 Answers1

5

The first argument to reduce is called "identity" and not "initialValue".

1 is no identity according to addition. 1 is identity for multiplication.

Though you need to provide 0 if you want to sum the elements.


Java uses "identity" instead of "initialValue" because this little trick allows to parallelize reduce easily.


In parallel execution, each thread will run the reduce on a part of the stream, and when the threads are done, they will be combined using the very same reduce function.

Though it will look something like this:

mainThread:
  start thread1;
  start thread2;
  wait till both are finished;

thread1:
  return sum(1, 3); // your reduce function applied to a part of the stream

thread2:
  return sum(1, 3);

// when thread1 and thread2 are finished:
mainThread:
  return sum(sum(1, resultOfThread1), sum(1, resultOfThread2));
  = sum(sum(1, 4), sum(1, 4))
  = sum(5, 5)
  = 10

I hope you can see, what happens and why the result is not what you expected.

Benjamin M
  • 23,599
  • 32
  • 121
  • 201
  • But then why is the result different in case of stream? – Utkarsh Yadav Mar 01 '21 at 19:57
  • Because non-parallel stream works sequentially. I'll provide an example in my answer in a few minutes – Benjamin M Mar 01 '21 at 19:59
  • Thanks @Benjamin this was quite helpful. – Utkarsh Yadav Mar 01 '21 at 20:20
  • Just interesting: the algorithm is a bit different - `identity` is being added even if values are filtered out: `IntStream.range(0, 100).parallel().filter(v -> v < 0).reduce(1,Integer::sum)` (`36` with jshell 1.15) –  Mar 01 '21 at 22:38
  • @user15244370 Yeah. And the result depends on the number of threads being used. You example gives: `4` for 1 thread, `12` for 2 threads, `20` for 4 threads, and `36` for 8 threads. I guess there's a chunk size in addition to the thread count, so that every thread only computes N elements at a time, and after it's finished it will get the next N elements. – Benjamin M Mar 01 '21 at 22:51