4

Looking at Java java.util.stream package documentation a doubt arose regarding best practices to follow in streams usage. Considering this code:

HashMap<Integer,Integer> map = new HashMap<>();
map.put(1,1);
map.put(2,2);
map.put(3,3);
map.put(4,4);
map.keySet().parallelStream().forEach(key -> {
        if (key == 3) { 
            map.put(3,0);
        }
});
  1. Is code ouput always equals to ([1,1],[2,2],[3,0],[4,4])?
  2. Can map.put(3,0) be considered as a non interfering operation?
  3. Can map.put(3,0) be considered as an accetable side-effect?

In other words, the above code can be considered compliant with the best practices suggested in streams documentation?

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
  • 3
    Is this a duplicate of https://stackoverflow.com/questions/32837415/non-interference-exact-meaning-in-java-8-streams? – Andrew Gilmartin Oct 22 '18 at 13:36
  • 2
    If it was possible to break code multiple times, you did it. You are violating the contract by modifying the source of the stream and you are violating it by modifying a `HashMap`, which is not thread safe, from an arbitrary unknown thread. Further, you are expecting the output to be `[1,1],[2,2],[3,0],[4,4]`, despite the `HashMap` does not guaranty any order, so even without the other contract violations, you can never assume that result for granted. In the end, you are asking whether doing the opposite of the documented constraints “can be considered compliant with the best practices”… – Holger Oct 22 '18 at 15:58

3 Answers3

6

Your example definitely violates the non interference requirement:

For most data sources, preventing interference means ensuring that the data source is not modified at all during the execution of the stream pipeline. The notable exception to this are streams whose sources are concurrent collections, which are specifically designed to handle concurrent modification.

Your data source, a HashMap, is not designed to handle concurrent modification and therefore it should not be modified at all during the execution of the stream pipeline.

Therefore the answer to your 2nd and 3rd questions is no.

As for the first question, your specific code may still produce the expected result, since your condition ensures only one thread will ever call map.put(3,0). However, this is still considered an incorrect usage of Streams.

Eran
  • 387,369
  • 54
  • 702
  • 768
  • 1
    That’s still getting the result by chance. There is only one thread modifying the map, but it’s doing it while other threads are reading it/iterating over it. The set of possible results still is much bigger and includes getting exceptions or infinite loops. This heavily depends on implementation details of `HashMap`, e.g. the distribution of these four entries in a `HashMap` with default capacity is not very parallel friendly. Further, it must be emphasized that expecting a particular output for a `HashMap`, is wrong, even in the sequential case, as the order of the entries is not guaranteed. – Holger Oct 22 '18 at 16:04
1

No, no, no.
Avoid side effects.
Sample code compliant with the documentation:

Map<Integer,Integer> updatedMap = map.keySet().parallelStream()
        .filter(e -> e == 3)
        .collect(Collectors.toMap(Function.identity(), e -> 0));
map.putAll(updatedMap);
Ilya
  • 720
  • 6
  • 14
1

Compare (a)

map.keySet().parallelStream().forEach(key -> {
        if (key == 3) { 
            map.put(3, 0);
        }
});

(a new entry is added)

with (b)

map.entrySet().parallelStream().forEach(e -> {
        if (e.getKey() == 3) { 
            e.setValue(0);
        }
});

(no Entry object is created, moved. But beware of LinkedHashMap.)

  • (a) Unsafe
  • (b) Safe

    1. Is code ouput always equals to ([1,1],[2,2],[3,0],[4,4])?

      (a) no (b) yes

    2. Can map.put(3, 0) be considered as a non interfering operation?

      (a) no (b) setValue(0) yes

    3. Can map.put(3, 0) be considered as an acceptable side-effect?

      (a) no (b) setValue(0) yes

So (a) is evil and (b) is allright.

Why the mention of the entrySet.setValue?

Actually HashMap.put in the Oracle implementation probably does the same as Entry.setValue. That would require the use of implementation knowledge - ugly.

Whereas Entry.setValue is based on a backing of the original map, and one might infere that just the value field is overwritten. Notice that a LinkedHashMap needs to reorder the entry, and that reordering is again unsafe.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • 1
    Note that a `LinkedHashMap` does not reorder on `Entry.setValue` by default, only when reorder on access has been specified in the constructor. Besides that, no thread safety guarantees have been specified for `HashMap`, even if you are only using `Entry.setValue`. – Holger Oct 22 '18 at 16:14
  • Thanks about the LinkedHashMap. About the HashMap _pure_: there is no reason to assume anything else problematic than concurrent access on the value field: for a setValue the map is not concerned nor is the entry moved. As possible base class it cannot guarantee anything. – Joop Eggen Oct 23 '18 at 07:49