9

I want to convert a Map into a ConcurrentHashMap via Java 8 Stream and Collector interface, and there are two options I can use.

The first:

Map<Integer, String> mb = persons.stream()
                                 .collect(Collectors.toMap(
                                            p -> p.age, 
                                            p -> p.name, 
                                            (name1, name2) -> name1+";"+name2,
                                            ConcurrentHashMap::new));

And the second:

Map<Integer, String> mb1 = persons.stream()
                                  .collect(Collectors.toConcurrentMap(
                                             p -> p.age, 
                                             p -> p.name));

Which one is the better option? When should I use each option?

Eran
  • 387,369
  • 54
  • 702
  • 768
KayV
  • 12,987
  • 11
  • 98
  • 148

2 Answers2

17

There is a difference between them when dealing with parallel streams.

toMap -> is a non-concurrent collector

toConcurrentMap -> is a concurrent collector (this can be seen from their characteristics).

The difference is that toMap will create multiple intermediate results and then will merge then together (the Supplier of such a Collector will be called multiple times), while toConcurrentMap will create a single result and each Thread will throw results at it (the Supplier of such a Collector will be called only once)

Why is this important? This deals with insertion order (if that matters).

toMap will insert values in the resulting Map in encounter order by merging multiple intermediate results (Supplier of that collector is called multiple time as well as the Combiner)

toConcurrentMap will collect elements in any order (undefined) by throwing all elements at a common result container (ConcurrentHashMap in this case). Supplier is called only once, Accumulator many times and Combiner never.

The small caveat here is that for a CONCURRENT collector to not invoke the merger: either the stream has to have the UNORDERED flag - either via the unordered() explicit call or when the source of the stream is not ordered (a Set for example).

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • 1
    is it going to be different if stream is not parallel ?, only difference i see it is just adding characteristics flags to the CollectorImpl then it evaluates according to that flags(insertion order), but if its not parallel stream both method acts same except one produce HashMap other produces ConcurrentHashMap, i meant it only effects current stream's behaviour if its parallel. – Ömer Erden Nov 30 '16 at 13:01
  • 2
    @Bolzano in a non-parallel stream, the Combiner will not get called for either implementation. The merging will be done in the Accumulator (called as many elements you have, while the Supplier will be called only once). So yes, the only visible difference is the result you will get : a HashMap vs a ConcurrentHashMap. under the hood toMap calls HashMap::merge, while toConcurrentMap calls ConcurrentHashMap::merge. So if you are doing serial processing why would you ever want ConcurrentHashMap? – Eugene Nov 30 '16 at 13:08
  • We need to ask it to the @KaranVerma , in his question he is not using parallel streams, I am assuming he is trying cache the values in ConcurrentHashMap to use it in any other time(for thread safe operations), and his question is simply : "whats the difference of this 2 code block" – Ömer Erden Nov 30 '16 at 13:19
  • @Bolzano i am just learning stream of java 8. Not using parallel stream at all currentlt. – KayV Nov 30 '16 at 16:07
8

From toMap's Javadoc :

The returned Collector is not concurrent. For parallel stream pipelines, the combiner function operates by merging the keys from one map into another, which can be an expensive operation. If it is not required that results are inserted into the Map in encounter order, using toConcurrentMap(Function, Function) may offer better parallel performance.

toConcurrentMap doesn't insert the results into the Map in encounter order, but supposed to give better performance.

If you don't care about the insertion order, it is recommended to use toConcurrentMap if you are using a parallel stream.

Eran
  • 387,369
  • 54
  • 702
  • 768