When using the reduce()
operation on a parallel stream, the OCP exam book states that there are certain principles the reduce()
arguments must adhere to. Those principles are the following:
- The identity must be defined such that for all elements in the stream u, combiner.apply(identity, u) is equal to u.
- The accumulator operator op must be associative and stateless such that
(a op b) op c
is equal toa op (b op c)
.- The combiner operator must also be associative and stateless and compatible with the identity, such that for all of
u
andt
combiner.apply(u, accumulator.apply(identity, t))
is equal toaccumulator.apply(u,t)
.
The book gives two examples to illustrate these principles, please see the code below:
example for associative:
System.out.println(
Arrays.asList(1, 2, 3, 4, 5, 6)
.parallelStream()
.reduce(0, (a, b) -> (a - b)));
What the book says about this:
It may output -21, 3, or some other value as the accumulator function violates the associativity property.
example for the identity requirement:
System.out.println(
Arrays.asList("w", "o", "l", "f")
.parallelStream()
.reduce("X", String::concat));
What the book says about this:
You can see other problems if we use an identity parameter that is not truly an identity value. It can output
XwXoXlXf
. As part of the parallel process, the identity is applied to multiple elements in the stream, resulting in very unexpected data.
I don't understand those examples. With the accumulator example the accumulator starts with 0 - 1
which is -1
, then -1 - 2
which is -3
, then -6
etc all the way to -21
. I understand that, because the generated arraylist isn't synchronized the results maybe be unpredictable because of the possibility of race conditions etc, but why isn't the accumulator associative? Wouldn't (a+b)
cause unpredictable results too? I really don't see what's wrong with the accumulator being used in the example and why it's not associative, but then again I still don't exactly understand what "associative principle" means.
I don't understand the identity example either. I understand that the result could indeed be XwXoXlXf
if 4 separate threads were to start accumulating with the identity at the same time, but what does that have to do with the identity parameter itself? What exactly would be a proper identity to use then?
I was wondering if anyone could enlighten me a bit more on these principles.
Thank you