14

I am having trouble comprehending why parallel stream and stream are giving a different result for the exact same statement.

    List<String> list = Arrays.asList("1", "2", "3");
    String resultParallel = list.parallelStream().collect(StringBuilder::new,
            (response, element) -> response.append(" ").append(element),
            (response1, response2) -> response1.append(",").append(response2.toString()))
            .toString();
    System.out.println("ResultParallel: " + resultParallel);

    String result = list.stream().collect(StringBuilder::new,
            (response, element) -> response.append(" ").append(element),
            (response1, response2) -> response1.append(",").append(response2.toString()))
            .toString();

    System.out.println("Result: " + result);

ResultParallel: 1, 2, 3

Result: 1 2 3

Can somebody explain why this is happening and how I get the non-parallel version to give the same result as the parallel version?

Óscar Andreu
  • 1,630
  • 13
  • 32
Claudiga
  • 427
  • 1
  • 4
  • 15
  • in parallel stream, you join two response with comma(,), but in case of non parallel, it's it always response and element, because it is always getting single element. – random_user Aug 07 '18 at 10:21
  • 1
    as a side note the same thing can be achieved with `Collectors.joining` – Eugene Aug 07 '18 at 10:34

3 Answers3

12

The Java 8 Stream.collect method has the following signature:

<R> R collect(Supplier<R> supplier,
              BiConsumer<R, ? super T> accumulator,
              BiConsumer<R, R> combiner);

Where BiConsumer<R, R> combiner is called only in the parallel streams (in order to combine partial results into a single container), therefore the output of your first code snippet is:

ResultParallel: 1, 2, 3

In the sequential version the combiner doesn't get called (see this answer), therefore the following statement is ignored:

(response1, response2) -> response1.append(",").append(response2.toString())

and the result is different:

1 2 3

How to fix it? Check @Eugene's answer or this question and answers.

Oleksandr Pyrohov
  • 14,685
  • 6
  • 61
  • 90
  • 1
    Not that I don't believe you, but is there documentation that explains this? – Slaw Aug 07 '18 at 10:24
  • 3
    Here is an [answer](https://stackoverflow.com/questions/29210176/can-a-collectors-combiner-function-ever-be-used-on-sequential-streams#answer-29295055) from Stuart Marks. – Oleksandr Pyrohov Aug 07 '18 at 10:32
  • 1
    @Slaw - See my answer. The javadoc hints at what is going on, though it doesn't directly explain it. – Stephen C Aug 07 '18 at 10:39
8

To understand why this is going wrong, consider this from the javadoc.

accumulator - an associative, non-interfering, stateless function that must fold an element into a result container.

combiner - an associative, non-interfering, stateless function that accepts two partial result containers and merges them, which must be compatible with the accumulator function. The combiner function must fold the elements from the second result container into the first result container.

What this is saying is that it should not matter whether the elements are collected by "accumulating" or "combining" or some combination of the two. But in your code, the accumulator and the combiner concatenate using a different separator. They are not "compatible" in the sense required by the javadoc.

That leads to inconsistent results depending on whether sequential or parallel streams are used.

  • In the parallel case, the stream is split into substreams1 to be handled by different threads. This leads to a separate collection for each substream. The collections are then combined.

  • In the sequential case, the stream is not split. Instead, the stream is simply accumulated into a single collection, and no combining needs to take place.


Observations:

  • In general, for a stream of this size performing a simple transformation, parallelStream() is liable to make things slower.

  • In this specific case, the bottleneck with the parallelStream() version will be the combining step. That is a serial step, and it performs the same amount of copying as the entire serial pipeline. So, in fact, parallelization is definitely going to make things slower.

  • In fact, the lambdas do not behave correctly. They add an extra space at the start, and double some spaces if the combiner is used. A more correct version would be:

    String result = list.stream().collect(
        StringBuilder::new,
        (b, e) -> b.append(b.isEmpty() ? "" : " ").append(e),
        (l, r) -> l.append(l.isEmpty() ? "" : " ").append(r)).toString();
    
  • The Joiner class is a far simpler and more efficient way to concatenate streams. (Credit: @Eugene)


1 - In this case, the substreams each have only one element. For a longer list, you would typically get as many substreams as there are worker threads, and the substreams would contain multiple elements.

Community
  • 1
  • 1
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • 1
    I would add on the observation part that this entire code is not really needed and one can use `Collectors.joining` – Eugene Aug 07 '18 at 10:39
  • Yes. There is a great temptation to try to do "clever things" with Java 8 streams ... only to discover that the end result is slower AND harder to understand. But to be honest, I don't understand what the OP is *trying* to achieve here. (Why make the accumulator and combiner incompatible? What would you *expect* should happen?) – Stephen C Aug 07 '18 at 10:42
  • I usually try do things as simple as possible but to the lack of my full knowledge of streams and collector API as i am still learning i probably made this more complicated than it needed to be. What would be the best way to get add a comma between values in a list and return result as a string using streams? – Claudiga Aug 07 '18 at 10:48
7

As a side note, even if you replace , with a space in the combiner, your results are still going to differ (slightly altered the code to make it more readable):

String resultParallel = list.parallelStream().collect(
            StringBuilder::new,
            (builder, elem) -> builder.append(" ").append(elem),
            (left, right) -> left.append(" ").append(right)).toString();

    String result = list.stream().collect(
            StringBuilder::new,
            (builder, elem) -> builder.append(" ").append(elem),
            (left, right) -> left.append(" ").append(right)).toString();


  System.out.println("ResultParallel: ->" + resultParallel + "<-"); // -> 1  2  3  4<-
  System.out.println("Result: ->" + result + "<-"); // -> 1 2 3 4<-

Notice how you have a little too many spaces.

The java-doc has the hint:

combiner... must be compatible with the accumulator function

If you want to join, there are simpler options like:

String.join(",", yourList)
yourList.stream().collect(Collectors.joining(","))
Eugene
  • 117,005
  • 15
  • 201
  • 306