5

I'm learning Java 8, in Reduction operations part in java.util.stream's package summary, it says:

More formally, the identity value must be an identity for the combiner function. This means that for all u, combiner.apply(identity, u) is equal to u. Additionally, the combiner function must be associative and must be compatible with the accumulator function: for all u and t, combiner.apply(u, accumulator.apply(identity, t)) must be equals() to accumulator.apply(u, t).

I don't understand why the identity value must be an identity for the combiner function. I think "combiner function must be associative and must be compatible" is enough to produce the same result either the stream is serial or parallel.

For example, I have a stream that has four elements e1, e2, e3, e4. If it's a serial stream, the result is identity ac e1 ac e2 ac e3 ac e4 (ac means accumulator function). If it's a parallel stream, the four elements can be split into two parts, [e1, e2] and [e3, e4], so the result is (identity ac e1 ac e2) co (identity ac e3 ac e4).

If given "the combiner function is associative and is compatible with the accumulator function", we can infer that "identity ac e1 ac e2 ac e3 ac e4" is equal to "(identity ac e1 ac e2) co (identity ac e3 ac e4)":

identity ac e1 ac e2 ac e3 ac e4
= (identity ac e1 ac e2 ac e3) co (identity ac e4) // because of compatibility
= (identity ac e1 ac e2) co (identity ac e3) co (identity ac e4) // because of compatibility
= (identity ac e1 ac e2) co ((identity ac e3) co (identity ac e4)) // because of associative property
= (identity ac e1 ac e2) co ((identity ac e3) ac e4) // because of compatibility
= (identity ac e1 ac e2) co (identity ac e3 ac e4)

So, why must the identity value be an identity for the combiner function?


Related questions:

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Jason Law
  • 965
  • 1
  • 9
  • 21
  • 10
    It doesn’t matter whether you write “the identity value must be an identity for the combiner function” or “the identity value must be an identity for the accumulator function”, as the combiner function and the accumulator function must be compatible. But at least one statement must be made and since the documentation makes exactly one statement (no redundancy), I don’t see the problem. – Holger Jan 31 '20 at 11:39
  • 1
    @Holger How do you define "the identity value is an identity for the accumulator function"? The identity value is of type `U`, and the element in the stream is of type `T`. And, why do we need “the identity value must be an identity for the combiner function” or “the identity value must be an identity for the accumulator function”? – Jason Law Jan 31 '20 at 11:54
  • 6
    1.) the first argument to the accumulator function has the same type as the identity value. 2) You are basically assuming that the value behaves as intended but are asking why we need to define that the value has to behave as intended. If you think, it was unnecessary to define it to be an identity value, why do you keep writing “`identity`” in your example? Replace it by an arbitrary value, say `e0`. Then, explain, why “*‘`e0` ac `e1` ac `e2` ac `e3` ac `e4`’* is equal to *‘(`e0` ac `e1` ac `e2`) co (`e0` ac `e3` ac `e4`)’*”. – Holger Jan 31 '20 at 12:04
  • @Holger your first point doesn't make sense. The accumulator function has type `(U, T) -> U`, so for `i` (with type `U`) to be an identity then `forall t:T i ac t = t`. The LHS and RHS of this equality have different types (`U` and `T` respectively), so it can't be true (aside from the special case where the types are the same). – jon hanson Jan 31 '20 at 12:51
  • 1
    @jon-hanson perhaps, [this answer](https://stackoverflow.com/a/59785061/2711488) helps, as the associativity constraint specified for the accumulator creates a similar confusion. It helps to consider, what has been cited even in this question, *the combiner function must be “compatible” with the accumulator function*. Conceptionally, there’s is only one Reduction function, as is literally with the other two `reduce` methods, but this three-arg overload allows to optimize the special case of map/reduce. The first argument must be an identity element to that reduction function. – Holger Jan 31 '20 at 13:05
  • 1
    @Holger I'm not confused, I'm simply addressing your usage of the statement “the identity value must be an identity for the accumulator function”. This isn't possible or even meaningful, because of, as JasonLaw also notes, the type mismatch. – jon hanson Jan 31 '20 at 13:35
  • 1
    @jon-hanson once you understood that there’s a conceptional Reduction function which is just expressed using two objects, you understand that the only thing that is not meaningful, is this pointless discussion, trying to nitpick on a wording. It doesn’t matter which parameter you are referring to, when both *must be compatible*, as they express the same. I invite you to try to find a real life example of a valid combination of accumulator and combiner function where it is impossible to determine the correctness of the identity element by only looking at the accumulator function. – Holger Jan 31 '20 at 13:44

1 Answers1

2

I don't understand why the identity value must be an identity for the combiner function.

It has to be, because otherwise if the provided stream is empty, no result could be generated. But the result of the reduction with identity is not optional. You can define a Java 8 reduction without an identity:

OptionalInt sum = Arrays.stream(new int[] { 0, 2, 6 }).reduce((a, b) -> a + b);

However if you do so, the empty stream will not return a valid result, as the reduction operation does not 'know' that 0 is the neutral element of the 'sum' operation you defined. Im mathematical terms, you are defining the identity element of the operation, so that there is a fallback in case the operation is not applicable at all. Visually you can describe the operation like this:

reduction = op(identity, a, b, c ...)

That is why you receive an OptionalInt, meaning that no integer value might have been calculated.

For a multiplication the identity is 1, for a sum 0 etc. As the identity must be neutral, it is not a fallback in the strictest sense, as a fallback can be anything, including -1 or any other value (must not be a numeric one). Thus OptionalInt offers a great alternative, because with an optional, you can apply the or(fallback) method, which does not need to be a neutral element of the reduction.

TreffnonX
  • 2,924
  • 15
  • 23