3

I considered two collections with a similar concept - ParHashMap from Scala and ConcurrentHashMap from Java. Both of them have the same time complexity and both of them are thread safe and lock-free, but they only are based on different concepts under the hood - trie and hash table accordingly. And this reasoning leads to question: why do we need for ParHashMap from Scala while there is ConcurrentHashMap from Java?

pacman
  • 797
  • 10
  • 28
  • ConcurrentHashMap is not lock free – talex Jan 09 '17 at 23:06
  • @talex You are wrong - https://en.wikipedia.org/wiki/Java_ConcurrentMap#Lock-free_atomicity – pacman Jan 09 '17 at 23:08
  • @talex I think it used to be non-lock-free but that changed in Java 8. – Louis Wasserman Jan 09 '17 at 23:11
  • @ Louis Wasserman `ConcurrenHashMap` always was `lock-free`, but in Java 7 it had an another structure with Node via leveraging a `Lock stripping` concurrent pattern, but in Java 8 it was changed - this approach was disabled and `rb tree` in case of collision would build. – pacman Jan 09 '17 at 23:15
  • @LouisWasserman I looked into source code and found `synchronized` keyword in implementation of `putVal` method. I think it mean it have lock. It can't be lock-free and have lock at the same tame. – talex Jan 10 '17 at 11:57
  • @talex `syncronized` block doesn't matter, methods from `Unsafe`, like `CAS` provide non-blocking features, `synchronized` blocks are one of the step of CHM algorithm behaving. Please check an algorithm of CHM for different operartions. – pacman Jan 10 '17 at 12:07
  • 2
    It provides lock-free queries, certain lock-free update operations and concurrent updates of different keys. But when performing atomic updates on the same key, synchronization is unavoidable. – Holger Jan 16 '17 at 16:36

1 Answers1

5

ConcurrentHashMap is a thread safe Map<> implementation. If you have multiple threads accessing it at the same time they will be in sync.

ParHashMap is a parallel collection. If you execute operations here (like map(), filter(), aggregate()) Scala will parallelize it for you (similar to Spark but only within a single JVM).

To summarize, ConcurrentHashMap gives the primitive to synchronize threads for concurrency, ParHashMap takes care of both sync and execution.

Edit: Note that ParHashMap is not itself necessarily thread-safe. The idea is to call its methods from a single thread and let the parallelism be handled by the parallel data structure itself.

marios
  • 8,874
  • 3
  • 38
  • 62
  • @ You said: "If you execute operations here (like map(), filter(), aggregate()) Scala will parallelize it for you" Doest it work similar to Spark parallelization mechanism? – pacman Jan 10 '17 at 09:44
  • Yes, similar, but only within a single machine. You can specify a thread pull to be used. – marios Jan 10 '17 at 16:57
  • 1
    Actually the inventor of Spark said that he drew inspiration from the parallel collection of Scala while inventing the RDD abstraction. – marios Jan 10 '17 at 17:30
  • But if compare `ParHashMap` and `CHM` without methods like map, filter and etc., only like a thread safe storage of values, are they almost same? – pacman Jan 10 '17 at 17:43
  • Probably CHM will be faster. Par collections are in general going to have a significant face lift in scala 2.13. I want to see where they wanna take them now that Spark is in the picture. – marios Jan 10 '17 at 17:48
  • Does `ParHashMap` provide an exactly feature like `CHM` for thread-safe element storing? – pacman Jan 10 '17 at 18:33
  • 2
    I think your best bet for a thread safe mutable hashmap in scala is to use the implementation from here: http://stackoverflow.com/a/17542165/1553233. Using the ParHashMap for this is not the right tool for the job (even if you can force it to be). – marios Jan 10 '17 at 18:53
  • 2
    Starting with Java 8, `ConcurrentHashMap` also supports several parallel processing methods, see for the methods starting with `forEach…`, `reduce…` or `search…` (or generally all methods having a first parameter `long parallelismThreshold`). – Holger Jan 16 '17 at 16:45
  • @Holger, that's awesome, I didn't know about that. Thanks! – marios Jan 16 '17 at 17:48
  • Are you sure that ParHashMap is thread-safe (i.e. can handle concurrent mutations)? This answer says it is not: https://stackoverflow.com/a/5926468/14955 – Thilo Nov 09 '17 at 00:40
  • 1
    I didn’t say it’s thread safe. Probably it’s not. I don’t think you would want to call this from multiple threads. It kind of beats the purpose. – marios Nov 09 '17 at 01:03
  • Thanks for the clarification. "takes care of sync" made it sound like that a little. – Thilo Nov 09 '17 at 02:47
  • Thanks for the suggestion. I updated the answer to make it more clear. – marios Nov 09 '17 at 03:24