58

I was watching a presentation on Java, and at one point, the lecturer said:

"Mutability is OK, sharing is nice, shared mutability is devil's work."

What he was referring to is the following piece of code, which he considered an "extremely bad habit":

//double the even values and put that into a list.
List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 1, 2, 3, 4, 5);
List<Integer> doubleOfEven = new ArrayList<>();

numbers.stream()
       .filter(e -> e % 2 == 0)
       .map(e -> e * 2)
       .forEach(e -> doubleOfEven.add(e));

He then proceeded writing the code that should be used, which is:

List<Integer> doubleOfEven2 =
      numbers.stream()
             .filter(e -> e % 2 == 0)
             .map(e -> e * 2)
             .collect(toList());

I don't understand why the first piece of code is "bad habit". To me, they both achieve the same goal.

George Cernat
  • 1,323
  • 9
  • 27
  • 13
    Make the stream parallel, and suddenly, the order is not respected anymore, or worse, the list is corrupted because several threads are mutating it concurrently, without synchronization. It doesn't happen with the second version. – JB Nizet May 27 '17 at 17:48
  • 1
    @JBNizet in the example that is not because of `parallel` (which has nothing to do with order of the resulted collection), but about `forEach`. – Eugene May 28 '17 at 06:20
  • 4
    @Eugene there is no sharing is there is a single thread. So, since this is about shared mutabiity, yes, this is about parallel. And yes, making that code parallel, even with a synchronized collection, would make the order of the colection undeterministic (It's actually already not guaranteed to have a deterministic order wthout parallel, even though it is in practice). – JB Nizet May 28 '17 at 06:23
  • 1
    I'm pretty sure this belongs on [SoftwareEngineering.SE]. For the record, I disagree with a saying that says, "Mutability is okay." Mutability gets you in trouble. It's more a necessary evil. – jpmc26 May 28 '17 at 06:56
  • 2
    @JBNizet should re-phrase that probably. There is no order *because* of `forEach`, as a consequence of applying `parallel`. My point was that parallel/sequential do not dictate the order; it's the operations that do. – Eugene May 28 '17 at 08:58
  • 3
    @Eugene OK, I can agree on that. – JB Nizet May 28 '17 at 09:00
  • @George Cernat why revert the edit? of course the title describes your current problem but it's repeated within the description again... The title should just give an insight about what the description will derive about and plus the title, I added before _Why is shared mutability bad?_ is much more generic therefore will help people to easily find it in the future. you can even check out [**How do I ask a good question?**](https://stackoverflow.com/help/how-to-ask) for clarification if needed. I hope that convinces you to revert the title to how it was. – Ousmane D. May 28 '17 at 12:48
  • @Aominè you are right – George Cernat May 28 '17 at 13:41
  • Presentation in question is available here https://youtu.be/1OpAgZvYXLQ – Tarun Dec 12 '18 at 07:06

4 Answers4

45

Explanation to the first example snippet

The problem comes into play when performing parallel processing.

//double the even values and put that into a list.
List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5, 1, 2, 3, 4, 5);
List<Integer> doubleOfEven = new ArrayList<>();

numbers.stream()
       .filter(e -> e % 2 == 0)
       .map(e -> e * 2)
       .forEach(e -> doubleOfEven.add(e)); // <--- Unnecessary use of side-effects!

This unnecessarily uses side-effects while not all side effects are bad if used correctly when it comes to using streams one must provide behaviour that is safe to execute concurrently on different pieces of the input. i.e. writing code which doesn’t access shared mutable data to do its work.

The line:

.forEach(e -> doubleOfEven.add(e)); // Unnecessary use of side-effects!

unnecessarily uses side-effects and when executed in parallel, the non-thread-safety of ArrayList would cause incorrect results.

A while back I read a blog by Henrik Eichenhardt answering as to why a shared mutable state is the root of all evil.

This is a short reasoning as to why shared mutability is not good; extracted from the blog.

non-determinism = parallel processing + mutable state

This equation basically means that both parallel processing and mutable state combined result in non-deterministic program behaviour. If you just do parallel processing and have only immutable state everything is fine and it is easy to reason about programs. On the other hand if you want to do parallel processing with mutable data you need to synchronize the access to the mutable variables which essentially renders these sections of the program single threaded. This is not really new but I haven't seen this concept expressed so elegantly. A non-deterministic program is broken.

This blog goes on to derive the inner details as to why parallel programs without proper synchronization are broken, which you can find within the appended link.

Explanation to the second example snippet

List<Integer> doubleOfEven2 =
      numbers.stream()
             .filter(e -> e % 2 == 0)
             .map(e -> e * 2)
             .collect(toList()); // No side-effects! 

This uses a collect reduction operation on the elements of this stream using a Collector.

This is much safer, more efficient, and more amenable to parallelization.

Ousmane D.
  • 54,915
  • 8
  • 91
  • 126
  • 1
    Even in the case of a mutable, synchronized list, you would end up having thread contention over the list, thus rendering parallel processing almost useless. – fps May 27 '17 at 22:44
  • 4
    That extract oversimplifies the situation, though it is fine as a rule of thumb. You can have parallel processing and mutable state without losing determinism by putting some restrictions on one or the other. For example, using lattice variables partially limits mutability. Obviously various forms of synchronization or coordination limit parallelism without limiting mutability. Subprograms can be non-deterministic while the whole program still meets a deterministic specification, and non-determinism can be part of the spec, so programs aren't inherently broken due to being non-deterministic. – Derek Elkins left SE May 27 '17 at 23:55
  • 1
    @FedericoPeraltaSchaffner I think was the point of the statement in the quote "if you want to do parallel processing with mutable data you need to synchronize the access to the mutable variables which essentially renders these sections of the program single threaded." – java-addict301 Sep 15 '17 at 15:18
16

The thing is that the lecture is slightly wrong at the same time. The example that he provided uses forEach, which is documented as:

The behavior of this operation is explicitly nondeterministic. For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism...

You could use:

 numbers.stream()
            .filter(e -> e % 2 == 0)
            .map(e -> e * 2)
            .parallel()
            .forEachOrdered(e -> doubleOfEven.add(e));

And you would always have the same guaranteed result.

On the other hand the example that uses Collectors.toList is better, because Collectors respect encounter order, so it works just fine.

Interesting point is that Collectors.toList uses ArrayList underneath that is not a thread safe collection. It's just that is uses many of them (for parallel processing) and merges at the end.

A last note that parallel and sequential do not influence the encounter order, it's the operation applied to the Stream that do. Excellent read here.

We also need to think that even using a thread safe collection is still not safe with Streams completely, especially when you are relying on side-effects.

 List<Integer> numbers = Arrays.asList(1, 3, 3, 5);
    Set<Integer> seen = Collections.synchronizedSet(new HashSet<>());
    List<Integer> collected = numbers.stream()
            .parallel()
            .map(e -> {
                if (seen.add(e)) {
                    return 0;
                } else {
                    return e;
                }
            })
            .collect(Collectors.toList());

    System.out.println(collected);

collected at this point could be [0,3,0,0] OR [0,0,3,0] or something else.

Eugene
  • 117,005
  • 15
  • 201
  • 306
6

Assume two threads perform this task at the same time, the second thread one instruction behind the first.

The first thread creates doubleOfEven. The second thread creates doubleOfEven, the instance created by the first thread will be garbage collected. Then both threads will add the doubles of all even numbers to doubleOfEvent, so it will contain 0, 0, 4, 4, 8, 8, 12, 12, ... instead of 0, 4, 8, 12... (In reality these threads won't be perfectly in sync, so anything that can go wrong will go wrong).

Not that the second solution is that much better. You would have two threads setting the same global. In this case, they are setting it both to logically equal values, but if they set it to two different values, then you don't know which value you have afterwards. One thread will not get the result it wants.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
0

In the first example, if you were to use parallel(), you’d have no guarantee of the insertions (multiple threads inserting the same element for example).

collect(...) on the other hand, when run in parallel, splits the work and internally collects the results in an intermediate step and then adds them to the final list, ensuring order and safety.

Sierra Bravo
  • 41
  • 2
  • 8