1

After reading https://stackoverflow.com/a/38728166/7826451 where OP says that reduce operation is supposed to be done on immutable objects, is the following usage wrong and if yes why? It produces result I would expect which is number 4 in the resulting 'foo' object.

   @Test
public void foo() {
    List<Foo> foos = new ArrayList<>();
    foos.add(new Foo(1));
    foos.add(new Foo(1));
    foos.add(new Foo(1));
    foos.add(new Foo(1));
    Foo foo = foos.stream().reduce(new Foo(0), Foo::merge);

    System.out.println();
}

static class Foo {
    int foo;
    Foo(int f) {
        foo = f;
    }

    Foo merge(Foo other) {
        foo += other.foo;
        return this;
    }
}
Naman
  • 27,789
  • 26
  • 218
  • 353

1 Answers1

1

Consider the following. Integer is immutable and Foo is mutable. Create two lists of each.

List<Foo> foos = IntStream.range(1, 1001).mapToObj(Foo::new)
        .collect(Collectors.toList());
List<Integer> ints = IntStream.range(1,1001).boxed()
        .collect(Collectors.toList());

Now reduce each stream to a single result.

Foo foo = foos.stream().reduce(new Foo(0), Foo::merge);
Integer intVal = ints.stream().reduce(Integer.valueOf(0), (a,b)->a+b);

System.out.println(foo);
System.out.println(integer);

Prints

500500
500500

Both are correct.

Now reduce them again using threads via parallel streams, combining the different threads using the third argument to reduce.

Foo foo = foos.parallelStream().reduce(new Foo(0), Foo::merge, Foo::merge);
Integer integer = ints.parallelStream().reduce(Integer.valueOf(0), (a,b)->a+b, (a,b)->a+b);

System.out.println(foo);
System.out.println(integer);

Prints

570026
500500

Oops! The problem has to do with multiple threads and foo objects being updated concurrently without any proper synchronization.

If you modify the Foo class merge method to the following, all is well.

Foo merge(Foo other) {
   return new Foo(this.foo + other.foo);
}

So Foo can be still be mutable via setters but you should not use those in reduction operations. Always return a new instance instead of modifying the current one.


class Foo {
        int foo;
        
        Foo(int f) {
            foo = f;
        }
        
        Foo merge(Foo other) {
            foo+=other.foo;
            return new Foo(foo);
        }
        public String toString() {
            return foo + "";
        }
    }
}
WJS
  • 36,363
  • 4
  • 24
  • 39
  • 1
    Since there’s a tendency of developers responding with “but if I don’t use parallel”, it’s worth noting that such problems with the wrong usage are not reserved to parallel processing. E.g. `Map m = IntStream.range(1, 1001).mapToObj(Foo::new).collect(Collectors.groupingBy(f -> f.foo % 8, Collectors.reducing(new Foo(0), Foo::merge))); System.out.println(m.values().stream().mapToInt(f -> f.foo).sum());` – Holger Feb 17 '21 at 08:34
  • 2
    Further, instead of using a merge function returning a new object, [Mutable Reduction](https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/util/stream/package-summary.html#MutableReduction) is possible by using `collect`. E.g. `Foo foo = foos.parallelStream().collect(() -> new Foo(0), Foo::merge, Foo::merge);` using the original mutable class works flawlessly… – Holger Feb 17 '21 at 08:43
  • Very interesting, I thought the problem would be precisely only if it was run in parallel. Could you elaborate a bit more on the example with the map and why it doesn't work the way some would expect? –  Feb 17 '21 at 17:06
  • @Holger could be give me a hint what causes this weird behaviour during the reduction in grouping by please, thanks? –  Feb 18 '21 at 22:56
  • 1
    @fishysushi the first argument to `reduce` is the *identity* element that could get merged at arbitrary places, an arbitrary number of times, as its contract requires that this makes no difference. When you modify that object, you’re breaking the contract. In practice, this shows when multiple parallel reductions are performed in different worker threads or when multiple reductions are performed due to grouping, one per group. Which can be maxed out by grouping in parallel. The future could bring other operations where multiple or partial reductions are beneficial. – Holger Feb 19 '21 at 07:08