8

My program has this line:

Function<String, Integer> f = (String s) -> s.chars().reduce(0, (a, b) -> 2 * a + b);

The function being passed to reduce is not associative. Reduce's documentation says that the function passed must be associative.

How can I rewrite this as an expression which doesn't break reduce's contract?

Eugene
  • 117,005
  • 15
  • 201
  • 306
Nick ODell
  • 15,465
  • 3
  • 32
  • 66
  • 3
    You can't. Just use a goold old loop. – JB Nizet Nov 13 '17 at 17:28
  • Are you talking about the function `f` or `(a, b) -> 2 * a + b`? if it's `f`, I think it's safe to use it in any parallel stream; if it's `(a, b) -> 2`, it will be ok too because i don't see any reason to do: `s.chars().parallel().reduce(0, (a, b) -> 2 * a + b)` – 123-xyz Nov 13 '17 at 18:59

2 Answers2

6

Under the current implementation and IFF you are not going to use parallel - you are safe with what you have right now. Obviously if you are OK with these disclaimers.

Or you can obviously create the function with a for loop:

 Function<String, Integer> f = s -> {
        int first = s.charAt(0) * 2 + s.charAt(1);
        int total = first;

        for (int x = 1; x < s.length() - 1; x++) {
            total = total * 2 + s.charAt(x + 1);
        }

        return total;

    };
Eugene
  • 117,005
  • 15
  • 201
  • 306
  • It's really the 'under the current implementation' part that bothers me. The Java devs have pacifically reserved the right to change how this works. – Nick ODell Nov 13 '17 at 17:50
  • Pacifically? You mean as opposed to having a fight over it? – Klitos Kyriacou Nov 13 '17 at 17:56
  • @NickODell yeah, that's the bitter part. You can still create a `Function` out of that with a simple for loop (edited the post) – Eugene Nov 13 '17 at 18:01
  • 2
    @NickODell you can even do a `boxed.reduce(three-arguments)` and simply throw an `Exception` for the merger that will get invoked for parallel computation; but *even that* is not guaranteed to be used *only* for parallel processing but the spec – Eugene Nov 13 '17 at 18:03
  • @Eugene That one still requires that the second argument be associative. https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#reduce-U-java.util.function.BiFunction-java.util.function.BinaryOperator- – Nick ODell Nov 13 '17 at 18:12
  • 2
    @NickODell I know, my point was that you can *explicitly* prohibit parallelization this way – Eugene Nov 13 '17 at 20:04
  • 1
    This looks very much like the hash code algorithm, except that it uses `2` instead of `31`. So generally, it’s possible to parallelize, see [this answer](https://stackoverflow.com/a/39396614/2711488), but for most use cases, you surely want to stick to the loop. But why aren’t you simply using `s -> { int total = 0; for(int x = 0; x < s.length(); x++) total = total * 2 + s.charAt(x); return total; };`? – Holger Nov 14 '17 at 09:25
3

You can convert this function to an associative function, as explained in this answer at the example of List.hashCode(). The difference lies only in the factor (2vs.31) and the start value (1vs.0).

It can be adapted to your task, which is especially easy when you have a random access input like a String:

Function<String, Integer> f =
    s -> IntStream.range(0, s.length()).map(i -> s.charAt(i)<<(s.length()-i-1)).sum();

This would even run in parallel, but it’s unlikely that you ever encounter such humongous strings that a parallel evaluation provides a benefit. So what remains, is that most people might consider this solution less readable than a simple for loop…


Note that the above solution exhibits a different overflow behavior, i.e. if the String has more than 32 chars, due to the usage of the shift operator rather than multiplying with two.
The fix for this issue makes the solution even more efficient:

Function<String, Integer> f = s ->
    IntStream.range(Math.max(0, s.length()-32), s.length())
             .map(i -> s.charAt(i)<<(s.length()-i-1)).sum();

If the string has more than 32 chars, it only processes the last 32 chars, which is already sufficient to calculate the same result as your original function.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • I actually am writing a hash function - I was trying to provide a simple, self-contained, example. Thank you for the link to the other question - I thought I searched for duplicates well enough, but apparently not. – Nick ODell Nov 15 '17 at 02:12