4

I'm trying to understand why the result of this example is always true here's my example :

 String s1 = Arrays.asList("A", "E", "I", "O", "U").stream()
                .reduce("", String::concat);
 String s2 = Arrays.asList("A", "E", "I", "O", "U").parallelStream()
                .reduce("", String::concat);

System.out.println(s1.equals(s2));

This always print true , what I know is with the parallelStream we can't predict the results can someone explain why?

ETO
  • 6,970
  • 1
  • 20
  • 37
e2rabi
  • 4,728
  • 9
  • 42
  • 69

3 Answers3

5

If you take a look at the docs of Stream.reduce() you will find this:

Performs a reduction on the elements of this stream [...] and returns the reduced value. This is equivalent to:

 T result = identity;
 for (T element : this stream)
     result = accumulator.apply(result, element)
 return result;

but is not constrained to execute sequentially.

So Stream.reduce() ensures the values are processed in order.

If you try using Stream.forEach() and print each value like this:

Arrays.asList("A", "E", "I", "O", "U").stream()
        .forEach(System.out::print);
System.out.println();
Arrays.asList("A", "E", "I", "O", "U").parallelStream()
        .forEach(System.out::print);

You will get this result (or something similar in the second line):

AEIOU
IUOEA

(Stream.forEachOrdered() with the example above will also print the values in order)

Samuel Philipp
  • 10,631
  • 12
  • 36
  • 56
  • 3
    Also important part here is fact that `String::concat` is *associative* (so with function `fun` it would be always true that `(a fun b) fun c == a fun (b fun c)`). If accumulator bi-function wouldn't be associative we wouldn't get correct results. For instance reducing stream containing `1,2,3,4` with accumulator function `(a,b) -> a-b` instead of being executed like `(((1-2)-3)-4) = -8` in paralel processing could be executed like `(1-2)-(3-4) = (-1) - (-1) = 0`. Demo: https://ideone.com/wUWknA – Pshemo Oct 19 '19 at 22:14
  • 1
    @Pshemo Another factor that does drive it is though the identity value as mentioned in [this answer](https://stackoverflow.com/a/58469499/1746118), where the results could still be unpredictable even if the accumulator is associative. – Naman Oct 20 '19 at 02:12
  • @Naman I have recently found out how "identity" is called in functional languages (scala) in certain cases, it's beautiful! – Eugene Oct 20 '19 at 02:26
  • @Eugene Any references? I would like to read more about what you are pointing to. :) – Naman Oct 20 '19 at 02:33
  • 1
    @Naman "monoid" - it does not map one-to-one to identity, but it's awesome name! – Eugene Oct 20 '19 at 02:34
4

There are actually two factors that drive the predictability of using reduce along with the parallel streams, these are:

Associativity of the accumulator

The binary operator provided to accumulate the stream as mentioned by others as well should be associative to produce predictable results. This also means that how the elements are grouped while performing the operation doesn't really matter.

(((a b) c) d) = ((a b) (c d))

e.g. A binary operation which is not associative when performed during reduction produces different results:

List<String> strings = List.of("An", "example", "of", "a", "binary", "operator");
System.out.println(strings.stream().reduce("", (s, str) -> String.valueOf(s.equals(str))));
System.out.println(strings.stream().parallel().reduce("", (s, str) -> String.valueOf(s.equals(str))));

The identity value

If the value provided as an identity element is actually not an identity, the results can still remain unpredictable. Hence it must imply

identity x = x 
x identity = x

e.g. An element " " (space) which is not an identity element for a string concatenation would result differently when using parallel streams:

List<String> identityCheck = List.of("An", "example", "of", "a", "identity", "element");
System.out.println(identityCheck.stream().reduce(" ", String::concat));
System.out.println(identityCheck.stream().parallel().reduce(" ", String::concat));

Not only does the parallel execution here result in unexpected results but different executions on the same data set might also produce different results.

Naman
  • 27,789
  • 26
  • 218
  • 353
1

TL;DR

The string concatenation is an associative operation, so both the sequential and parallel reduction will yield the same result.


Long answer

The idea behind parallel streams is to use multiple cores while processing the elements. So the processing order is not guaranteed.

If your operation is associative (like addition of integers or concatenation of strings), then the parallel processing will boost the performance, while yielding the same result:

( "A", "E", "I", "O", "U" ) reduce using concatenation operator = ? 
  • Sequential reduce

    "A" + "E" = "AE"
    "AE" + "I" = "AEI"
    "AEI" + "O" = "AEIO"
    "AEIO" + "U" = "AEIOU"
    
  • Parallel reduce (one of scenarios)

    "A" + "E" = "AE"
    "I" + "O" = "IO"
    "IO" + "U" = "IOU"
    "AE" + "IOU" = "AEIOU"  // same result
    

In contrast, when your operation is non associative (like subtraction of integers), then the parallel processing will yield and unpredicted result:

(1 , 2 , 3 , 4) reduce using - operator = ? 
  • Sequential reduce

    1 - 2 = -1
    -1 - 3 = -4
    -4 - 4 = -8
    
  • Parallel reduce (one of scenarios)

    1 - 2 = -1
    3 - 4 = -1
    -1 - -1 = 0  // different result
    
ETO
  • 6,970
  • 1
  • 20
  • 37