6

The question is about java.util.stream.Stream.reduce(U identity,BiFunction<U, ? super T, U> accumulator, BinaryOperator<U> combiner) method.

One of the requirements is that the combiner function must be compatible with the accumulator function; for all u and t, the following must hold:

 combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t) (*) 

If the combiner and accumulator are the same, the above equality is automatically true.

A BinaryOperator is actually extending BiFunction, therefore I can use it when BiFunction is required. If U and T are identical, the following is always legal:

operator<T> op = (x,y) -> something;

stream.reduce(id, op, op);

Of course, one cannot always use the combiner as acumulator since, in the general case, they serve for different purposes and are different Java types.

My question

Is there an example of stream reduction with distinct combiner and accumulator?

Also, I'm not interested in trivial examples, but natural examples that I can encounter in practice while doing reduction on parallel streams.

For trivial examples, there are many tutorials, like this one

Why am I asking this question

Basically, the reason this reduction method exists is for parallel streams. It seems to me the condition (*) is so strong that, in practice, it renders this reduction useless since rarely the reduction operations fulfill it.

2 Answers2

4

If the combiner and accumulator are the same? You are confusing things here.

accumulator transforms from X to Y for example (using the identity), while combiner merges two Y into one. Also notice that one is a BiFunction and the other one is a BinaryOperator (which is actually a BiFunction<T, T, T>).

Is there an example of stream reduction with distinct combiner and accumulator?

These look pretty different to me:

    Stream.of("1", "2")
          .reduce(0, (x, y) -> x + y.length(), Integer::sum);

I think you might be confused with things like:

Stream.of("1", "2")
      .reduce("", String::concat, String::concat);

How is it possible to do?

BiFunction<String, String, String> bi = String::concat;

Well there is a hint here.

EDIT

Addressing the part where "different" means different operations, accumulator might sum, while accumulator might multiply. This is exactly what the rule :

combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t)

is about, to protected itself from two separate associative functions, but different operations. Let's take an example of two lists (equal, but with different order). This, btw, would be a lot funner with a Set::of from java-9 that adds an internal randomization, so theoretically for the same exact input, you would get different result on the same VM from run to run. But to keep it simple:

List.of("a", "bb", "ccc", "dddd");
List.of("dddd", "a", "bb", "ccc");

And we want to perform:

....stream()
   .parallel()
   .reduce(0,
          (x, y) -> x + y.length(),
          (x, y) -> x * y);

Under the current implementation, this will yield the same result for both lists; but that is an implementation artifact.

There is nothing stopping an internal implementation in saying: "I will split the list to the smallest chunk possible, but not smaller than two elements in each of them". In such a case, this could have been translated to these splits:

["a",    "bb"]     ["ccc", "dddd"]
["dddd", "a" ]     ["bb" , "ccc" ]   

Now, "accumulate" those splits:

0 + "a".length   = 1 ; 1 + "bb".length   = 3 // thus chunk result is 3
0 + "ccc".length = 3 ; 3 + "dddd".length = 7 // thus chunk result is 7 

Now we "combine" these chunks: 3 * 7 = 21.

I am pretty sure you already see that the second list in such a scenario would result in 25; as such different operations in the accumulator and combiner can result in wrong results.

Thiyagu
  • 17,362
  • 5
  • 42
  • 79
Eugene
  • 117,005
  • 15
  • 201
  • 306
  • 2
    I only said the combiner and accumulator can be the same, not that I can **always** use the accumulator as combiner. –  Nov 22 '19 at 08:15
  • @EugenCovaci didn't you answer your question yourself with the edit? 1) of course you can use `stream.reduce(id, op, op);` if you are reducing to the same type that the Stream has (this is what happens internally when you do `stream.reduce(id, op);` anyway). 2) if you want to reduce to a _different_ type than the Stream has, those 2 will be different. – Eugene Nov 22 '19 at 11:59
  • @EugenCovaci or do you mean "different" in the sense that they perform different operations? like one does a `sum`, the other does a `multiply`? – Eugene Nov 22 '19 at 12:09
  • Yes, in the sense that they perform different operations. –  Nov 22 '19 at 12:52
  • @EugenCovaci then is such a case this question is _only_ interesting if _both_ the `accumulator` and `combiner` are associative, but entail a different operation (`sum` and `multiply` for example). Working on an edit... will ping when done. – Eugene Nov 22 '19 at 15:33
  • Thank you very much for your interest. Actually, the associativity is a required property for reduction. –  Nov 22 '19 at 15:35
  • I think `(x, y) -> x + y.length()` is not associative, it breaks the reduction contract. –  Nov 22 '19 at 16:56
  • @EugenCovaci can you _prove_ that? Is that any different than : `Stream.of("one", "two").map(String::length).reduce(0, Integer::sum)`? – Eugene Nov 22 '19 at 16:58
  • I think you need to read @Eugene's code snippet example again. What _type_ is the identity being passed to reduce? And hence what _type_ is `x` in `(x, y) -> x + y.length()`? – user31601 Nov 22 '19 at 17:20
  • Right, my example was wrong. The question is: how can the associativity be formulated for the accumulator? –  Nov 22 '19 at 17:39
  • `(a op b) op c == a op (b op c)` doesn't make sense since `b` has to be String from `(a op b)` and int from `(b op c)` –  Nov 22 '19 at 17:50
  • Indeed, I see no need for the accumulator to be associative - it doesn't even need to be an operator. It does has to be _commutative_ in the second argument (i.e. `op(op(a, b), c) === op(op(a, c), b)`. – user31601 Nov 22 '19 at 17:55
  • From the documentation: accumulator – an associative, non-interfering, stateless function for incorporating an additional element into a result –  Nov 22 '19 at 18:01
  • I would assume they meant to say commutative there. Otherwise, I literally don't know what they mean. – user31601 Nov 22 '19 at 18:06
  • @user31601 it _might_ have to do with the fact that internally combiner is the same as the accumulator, when you use `reduce(Identity, accumulator)`. – Eugene Nov 22 '19 at 18:08
  • Yes - I can see that associativity would be required for the 2-argument overload, because the combiner _does_ need to be associative. – user31601 Nov 22 '19 at 18:09
  • @user31601 exactly! as such document it as such and move on, probably much more simple at the implementation side. – Eugene Nov 22 '19 at 18:27
  • I actually think this a case of copy-paste laziness on the part of the API developers. The 3 argument overload won't delegate to the 2-argument overload (more likely the other way round), so there's absolutely no reason why the accumulator needs to be associative in the 3-arg overload. Indeed, when `T` and `U` are different, the concept of associativity doesn't even make sense. – user31601 Nov 22 '19 at 18:39
0

So, here's are a few examples. Some of these may count as "trivial", particularly where there's already a function to do it for you.

An example where T and U are the same

These are quite difficult to come up with, and are a bit contrived, as they generally involve assuming that the elements of stream and the object being accumulated have different meanings, even though they have the same type.

Counting

If we have a stream of integers, we could count them using reduce:

stream.reduce(0, (count, item) -> count+1, (a, b) -> a+b);

Obviously, we could just use stream.count() here, but I'm willing to bet count uses the 3 argument version of reduce internally.

An example where T and U are different

This gives us quite a lot of freedom, and obviously, the accumulator and combiner are never going to be the same here, as they have different types.

One of the most common ways we may want to aggregate is gathering into a collection. We could use reduce for that, but since in Java collection types are typically mutable, using collect will generally be more efficient. This rule applies generally: if the result type mutable, use collect rather than reduce.

Determining the range of a stream of numbers

class Range {
  static Range NONE = new Range(Double.NaN, Double.NaN);

  final double min, max;

  static Range of(double min, double max) {
    if(Double.isNaN(min) || Double.isNaN(max) || min>max) {
      throw new IllegalArgumentException();
    }
    return new Range(min, max);
  }

  private Range(double min, double max) {
    this.min = min;
    this.max = max;
  }

  boolean contains(double value) {
    return this!=Range.NONE && min<=value && max>=value;
  }

  boolean spans(Range other) {
    return this==other
      || other==Range.NONE
      || (contains(other.min) && contains(other.max));
  }

}

Range range = streamOfDoubles.reduce(
    Range.NONE,
    (range, value) -> {
      if(range==Range.NONE)
        return Range.of(value, value);
      else if(range.contains(value))
        return range;
      else
        return Range.of(Math.min(value, range.min), Math.max(value, range.max));
    },
    (a, b) -> {
      if(b.spans(a))
        return b;
      else if(a.spans(b))
        return a;
      else
        return Range.of(Math.min(a.min, b.min), Math.max(a.max, b.max));
    }
);
user31601
  • 2,482
  • 1
  • 12
  • 22