8

I really want to know the exact difference between Stream.reduce() and Stream.parallel.reduce()

To clear everything I created a small program and found that result is not equal with same values .

public class Test {

    public static void main(String[] args) {
        int a = Stream.of(1, 2, 3).map(i -> i * 10).reduce(5, (abc, cde) -> abc + cde);
        int b = Stream.of(1, 2, 3).map(i -> i * 10).
        parallel().reduce(5, (abc, cde) -> abc + cde);
        System.out.println(a == b) //False;
    }
}

So, does this means that they both are different if so please help me understand how they are different in functionality wise ?

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
T-Bag
  • 10,916
  • 3
  • 54
  • 118

1 Answers1

10

It seem that you are misusing the reduce function. When using reduce with an identity value, you have to make sure the identity corresponds to an identity on the associative reduce function.

See the full documentation, and a good explanation of what reduce does here. The reduce javadoc says:

The identity value must be an identity for the accumulator function. This means that for all t, accumulator.apply(identity, t) is equal to t. The accumulator function must be an associative function.

In your case, 5 is not the identity of the + function you are using for reducing, thus leading to strange results when using parallel reduces. 0 is the identity of addition, so a correct way to compute would be to add 5 to the list, and use reduce(0, (x, y) -> x + y)). Additionally, since you are reducing a stream of int to an int, you can simply use reduce((x, y) -> x + y).

The reason is that parallel reduce uses the information that identity is a mathematical identity to optimize for parallel execution. In your case, it will inject multiple identityvalues in the computation.

Cloud
  • 938
  • 1
  • 8
  • 24
tonio
  • 10,355
  • 2
  • 46
  • 60
  • Thanks for the information, so this means they are not same functions – T-Bag Oct 16 '17 at 10:24
  • 4
    No, they are the same method. Read the last paragraph of the answer. ---- The linear case starts with 5, adds 10, add 20, adds 30, ends up with **65**. ---- In the parallel case, it might split the numbers into [1, 2] and [3]. So now it starts with 5, add 10, adds 20; and in parallel starts with 5, adds 30. Now it sums up the sub results of 35 and 35, ends up with **70**. ---- When using reduce with two parameters, the first one has to have no effect in the reduce function, especially when using parallel. But 5 **does** have an effect when using `+`. – Malte Hartwig Oct 16 '17 at 12:48
  • 3
    In principle, it would also be legal for a stream to just return the sum of all elements when you say `reduce(5, (x,y) -> x+y)`, without ever adding `5` to it. Even for a sequential stream. – Holger Oct 16 '17 at 15:27