1

I am just simulating an api with a fake database in memory and using scala.collection.mutable.HashMap[Int, AnyRef]

  1. Which is the best collection to support concurrent inserts? There is a better choice?

  2. Suppose I need another kind of collection like Map[Int, AnyRef], but this time the keys need to be sorted. A TreeMap is the best choice?

Thanks in advance

adamwy
  • 1,239
  • 1
  • 7
  • 12
coffee
  • 3,048
  • 4
  • 32
  • 46

3 Answers3

1

Which is the best collection to support concurrent inserts? There is a better choice?

Use immutable data structures

TreeMap is the best choice?

Yes.

Straight forward way to achieve thread safety is by using immutable data structures.

Scala provides immutable data structures. Just import scala.collection.immutable._.

In order to be sorted use scala.collection.immutable.TreeMap

This post tells about how to use TreeMap and provide custom ordering

Community
  • 1
  • 1
Nagarjuna Pamu
  • 14,737
  • 3
  • 22
  • 40
1

You have two options here.

You can use immutable data structures like scala.collection.immutable.HashMap, which provides highly efficient immutable hash map. You also need to remember that every update to this map needs to be synchronized like this:

object Database {
  private var map = new immutable.HashMap[Int, AnyRef]
  def get(index: Int) = map(index)
  def insert(index: Int, value: AnyRef) = 
    synchronized(map = map.updated(index, value))
}

The other way is to use concurrent mutable map, like scala.collection.concurrent.TrieMap, which doesn't require additional locking:

object Database {
  private val map = new concurrent.TrieMap[Int, AnyRef]
  def get(index: Int) = map(index)
  def insert(index: Int, value: AnyRef) = map.put(index, value)
}
adamwy
  • 1,239
  • 1
  • 7
  • 12
  • You mean in the first example? `var` is necessary here since you need to update `Database` state somehow. – adamwy Dec 25 '16 at 15:06
  • Don't pin to concrete implementations unless you have to for some specific reason: `map = Map.empty[Int, AnyRef]` or `map = SortedMap.empty[Int, AnyRef]` – Dima Dec 25 '16 at 15:17
0

I disagree with the above advise. If you are going to have mutable state anyway, you are better off isolating it inside the data container rather than replacing the container itself every time.

You are better off using java containers for this purpose. For a hashmap, java ConcurrentHashMap is your best choice. For a sorted implementation, you'll have to synchronize explicitly:

 object DB {
   import java.util._
   val hashed = new concurrent.ConcurrentHashMap[String, AnyRef]
   val sorted = Collections.synchronizedMap(new TreeMap[Int, AnyRef])
}

You can import scala.collection.JavaConversions._ to implicitly convert those into scala Maps, to get goodies, like map, filter etc., But ... you probably should not. Using any of those wouldn't be a good idea under concurrency in 99% of cases. Anything, other than you regular get and put (and, put/computeIfNotExists for the ConcurrentHashmap case) primitives would be non-trivial to implement and dangerous to use.

Think of these as just primitive key-value containers, not full-blown scala collections.

Dima
  • 39,570
  • 6
  • 44
  • 70
  • 1
    "If you are going to have mutable state anyway, you are better off isolating it inside the data container rather than replacing the container itself every time." Could you elaborate why? Immutable structures like HAMT are not making a copy of all data for a change, only a small part of it. Also OP only needs to create fake/mock database most likely for testing purposes, so it doesn't seem that performance is that critical here. – adamwy Dec 25 '16 at 15:31
  • I'll add that Scala's `scala.collection.concurrent.TrieMap` is likely more performant since all of its operations are lock-free underneath, while Java `ConcurrentHashMap` seems to be locking on put operations. – adamwy Dec 25 '16 at 15:34
  • Well, performance aside (immutable collections are indeed a bit faster than concurrent map), it is just less transparent semantically, and harder to reason about, when your entire "database" is replaced every time you make a change to it. If someone happens to keep a reference to old map, after you replace it with an updated one, it becomes invalid. If two updates happen concurrently and aren't atomic, you have to serialize them anyway, but semantics of immutable maps don't make it clear that it is necessary. According to benchmarks, `TrieMap` is actually slower than `ConcurrentHashMap` – Dima Dec 25 '16 at 19:44