3

Given the code

List<Integer> numbers = Arrays.asList(2, 4, 3);

int sumTotal = numbers.stream().reduce(-3, (x, y) -> x + y + 3);
int multiplyTotal = numbers.stream().reduce(1, (x, y) -> x * y);

Is it possible to perform both operations while iterating the stream only once?

Also note each reduce has a different identity: -3 and 1.

cahen
  • 15,807
  • 13
  • 47
  • 78
  • 5
    The *identity value* for an accumulator function is the value [“that for all `t`, `accumulator.apply(identity, t)` is equal to `t`”](http://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#reduce-T-java.util.function.BinaryOperator-). You should recognize that `0` is *not* the identity value for `x+y+3`… – Holger Jul 17 '15 at 18:29
  • identity + y + 3 = y , you're totally right. (I fixed the question) – cahen Jul 17 '15 at 20:06

3 Answers3

4

You can use a pairing collector which I wrote in this answer:

int[] result = numbers.stream().collect(
        pairing(
                Collectors.reducing(0, (x, y) -> x + y + 3),
                Collectors.reducing(1, (x, y) -> x * y),
                (sum, prod) -> new int[] { sum, prod }));

Of course you can combine the results in any other way. Putting them to array is just an example.

This collector is readily available in my StreamEx library (MoreCollectors.pairing). Alternatively you may use jOOL library: it has Tuple.collectors method which works in similar manner.

Community
  • 1
  • 1
Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • nice, does your lib also support more than 2 collectors? – cahen Jul 17 '15 at 20:57
  • No, at most two. Well, you can chain this `pairing` method using `pairing(pairing(col1, col2, combine1_2), col3, combine1_2_3)`, but it's weird. I decided that combining two collectors is moreless common and useful, but combining three or more is too much and the code will become very confusing, so it's better to define new object like @JBNizet suggests. Also I decided (at least for now) not to define new functional interfaces and user-visible data containers like tuple in my lib and there's no `TriFunction` in JDK. AFAIC Jool has ready methods to combine up to 8 collectors, so you can use it. – Tagir Valeev Jul 18 '15 at 06:40
3

You can create a custom class, and use a mutable reduction:

public static void main(String[] args) {
    List<Integer> numbers = Arrays.asList(2, 4, 3);

    Result result = numbers.stream().collect(Result::new, Result::consume, Result::combine);
    System.out.println("result = " + result);
}

private static class Result {
    private int sum = 0;
    private int product = 1;

    public void consume(int i) {
        sum += i + 3;
        product *= i;
    }

    public void combine(Result r) {
        // READ note below
        sum += r.sum + 3;
        product *= r.product;
    }

    @Override
    public String toString() {
        return "Result{" +
            "sum=" + sum +
            ", product=" + product +
            '}';
    }
}

But I doubt you will gain much by doing that over simply iterating twice as you're doing.

EDIT:

Note that the combine() method above, invoked when using a parallel stream, will produce results that are inconsistent with a sequential stream. The problem is caused, as Holger mentions in the comments, by the fact that the operation doesn't respect the identity preconditions of a reduction: applying the reduction on any value V with the identity is supposed to produce V, but it produces V + 3 with 0 as identity.

Make sure not to use a parallel stream to compute that result.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • You're absolutely right. You're not making it go any faster and the resulting code is way harder to understand. @Tagir's solution is slightly better. In terms of performance/readability, mine is closer to the original...you just have to create a throwaway class. Alternatively, in Java 8, can you use ints in a Map? – U Avalos Jul 17 '15 at 17:26
  • 1
    Your solution is actually the same as mine. The main difference is that, since operations are supposed to be parallelizable, I have to provide a combine operation that you JS code doesn't have to provide. Using a Map would make it worse: less readable, implying more boxing/unboxing operations, and get()/put() on the map. – JB Nizet Jul 17 '15 at 17:31
  • 2
    Your combiner is wrong. For a correct reduction, a combiner has to apply the same function to the partial results as the sequential reduction, i.e. `x+y+3` rather than just summing up. Interestingly, it *appears* to produce consistent results as it compensates the error of the questioner: `0` is *not* the identity value of `x+y+3`. So each thread is off by `3` which your combiner compensates by not adding the `3` and thus the final result is always the same, off by `3` as in the question’s original code. But the correct result should be the same as by `.reduce((x, y) -> x + y + 3).get()` – Holger Jul 17 '15 at 18:21
  • @Holger If you look at the history, you'll see that I initially implemented it correctly, then tested it and saw that it didn't produce consistent results with sequential streams, and thought that the operation was not associative, but couldn't come with an example proving it. So I doubt and thought my own logic was incorrect, and thus removed the + 3, which produced consistent results. I see now that the problem is not really associativity, but identity: apply(identity, T) must produce T, and produces T + 3. What would you advise doing in such a case? throw an exception from combine()? – JB Nizet Jul 17 '15 at 18:27
  • 1
    I don’t think that it is the task of the combiner to verify the correctness of the function constraints. I don’t see how it can do that either. Note that the stream’s standard `reduce` implementation won’t verify it but just produces inconsistent results as well, if you fail to provide a correct identity. – Holger Jul 17 '15 at 18:33
  • 1
    Well, using `-3` as identity value would solve it as well… – Holger Jul 17 '15 at 18:49
  • Ah, I see. The problem is not that it's not parallelizable. The operation can be correctly made in parallel if the correct identity is chosen: -3. So indeed, the combiner shouldn't throw an exception. Right? – JB Nizet Jul 17 '15 at 18:55
  • So it's possible to input the wrong identity and don't get a single error or even a warning, sounds dangerous – cahen Jul 17 '15 at 20:12
  • As for the initial question, the performance gain doesn't seem relevant with integers, but suppose each item of the list is a File that needs to be read so 2 different operations (e.g. count digits and count letters) could run. I definitely wouldn't want to iterate the list twice. Also, I wouldn't want to store the content of all files in memory to avoid opening them twice. So imo this solution is faster. On the other hand, the "collect" part of it was delegated to a non-functional code, unlike @Tagir's answer, but is still faster nonetheless – cahen Jul 17 '15 at 20:47
  • and the Result class could be adapted to support a list of different operations whilst the `pairing` only supports 2 collectors – cahen Jul 17 '15 at 20:47
  • @cahen: Tagir’s solution does the same, but the collector is hidden inside the StreamEx library. Using such an abstraction has a point, but since it’s a generic solution, it has to box the numbers into objects, which is most likely the reason why JB Nizet’s solution is faster… – Holger Jul 20 '15 at 09:12
1

In JS, one solution is to iterate over an object instead of a number. Ex:

  numbers.stream().reduce({sum:0,prod:1}, (memo, cur) ->
       memo.sum = memo.sum + cur + 3;
       memo.prod = memo.prod * cur;
       return memo;
  )

Note that in your example, x is the "memo" (or accumulator) and y is the current value.

The same idea should also work in Java.

U Avalos
  • 6,538
  • 7
  • 48
  • 81