4

I want to create a thread-safe container that uses a Scala Map as a backing store. Rather than expose the user to the underlying Map, I would rather only expose a subset of its methods.

Example might look something like the following...

class MyContainer[A] {

  def add(thing: A): Unit = {
    backingStore = backingStore + (thing.uuid -> thing)
  }

  def filter(p: A => Boolean): Option[Iterable[A]] = {
    val filteredThings = backingStore.values.filter(p)
    if (filteredThings.isEmpty) None else Some(filteredThings)
  }

  def remove(uuid: UUID): Option[A] = backingStore.get(uuid) match {
    case optionalThing @ Some(thing) =>
      backingStore = backingStore - uuid; optionalThing
    case None => None
  }

  @ volatile private[this] var backingStore = immutable.HashMap.empty[UUID, A]

}

...I suspect that even though the underlying backing store is immutable and its reference is volatile, the container is not thread-safe.

Suppose that I have two separate threads running with access to an instance of the above container. Thread 1 filters the underlying collection and gets some results; at the same time Thread 2 removes an item. The results that thread one has might contain a reference to the item that Thread 2 removed? There might be other problems.

Am I correct that the above implementation is not thread-safe? What would be the most idiomatic way to make the above thread-safe using Scala?

Edit: I would prefer to avoid blocking and synchronization if possible. If blocking/synchronization must be used then is the volatile reference needed? And what would be the point of the immutable collection? Couldn't I just as well use a mutable collection?

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
davidrpugh
  • 4,363
  • 5
  • 32
  • 46

2 Answers2

3

You are using a copy-on-write approach, so your problem of a concurrent read and write is that they are not strictly ordered, but that's not really a problem: it's simply a timing issue in that if A is writing while B is reading there is no guarantee about whether A will see B's edits.

Your real problem is when you have C and D writing simultaneously: then they can both read the same starting map, update their own copies and then write only their own edits. Whoever writes first will have their changes overwritten.

Consider a starting map containing (A,B), and threads C and D adding entries 'C' and 'D' trespectively, while threads E anf F read the map; all this happenning concurrently. One possible reuslt is:

C reads map (A,B)
D reads map (A,B)
C writes map (A,B,C)
E reads map (A,B,C)
D writes map (A, B, D)
F reads map (A, B, D)

The 'C' entry appeared trnasiently and was then lost forever.

The only way to reliably sequence the writes is to ensure it is never entered concurrently. Either with a synchronize locke nforce single entry the write block or ensure it is serialised by using a single Akka actor to perform updates.

You need to synchronize reads also if you care about ordering of reads vs writes, but if you have multiple threads accessing this, that's unlikely to be a real concern.

Iadams
  • 536
  • 3
  • 7
  • @ladams If I understand you correctly I need to sychronize the add/remove (i.e., write) methods; but for read methods like filter I do not need to synchronize (unless I really care about the ordering of read and writes). – davidrpugh Sep 18 '16 at 10:17
  • @ladams If I only sychronize the `add` and `remove` methods, then I still need to annotate the reference to the backing store as @volatile correct? – davidrpugh Sep 19 '16 at 01:56
  • Yes, you need to synchronize both the add/remove methods. Marking the reference as volatile helps the timeliness of update visibility: if you don't sdo this, a reader thread may retain a reference to an old version of the backing store for an indeterminate amount of time after other thread(s) have updated it, potentially multiple times. To ensure visibility to other threads asap you need to do two things: make the writer cause a memory write from processor cache to main memory (which happens on exit of a synchronize block), make the reader read it into its cache, which volatile forces. – Iadams Sep 19 '16 at 07:45
1

Am I correct that the above implementation is not thread-safe?

Yes. It is not thread safe. But it does have the right memory visibility semantics.

For simplicity you could make it thread safe by:

class MyContainer[A <: {def uuid: UUID}] {

  def add(thing: A): Unit = this.synchronized{
    backingStore = backingStore + (thing.uuid -> thing)
  }

  def filter(p: A => Boolean): Option[Iterable[A]] = this.synchronized{
    val filteredThings = backingStore.values.filter(p)
    if (filteredThings.isEmpty) None else Some(filteredThings)
  }

  def remove(uuid: UUID): Option[A] = this.synchronized{
    backingStore.get(uuid) match {
      case optionalThing @ Some(thing) =>
        backingStore = backingStore - uuid; optionalThing
      case None => None
    }
  }

  import scala.collection.immutable.HashMap
  private[this] var backingStore = HashMap.empty[UUID, A]
}
Community
  • 1
  • 1
Jatin
  • 31,116
  • 15
  • 98
  • 163
  • What does this syntax mean: `[A <: {def uuid: UUID}]` ? Have never seen this. – Samar Sep 16 '16 at 10:05
  • @Samar It means that a [structural type](https://twitter.github.io/scala_school/advanced-types.html#structural) containing a method called `uuid` is an upper bound to A. – Yuval Itzchakov Sep 16 '16 at 10:44
  • Enlightened :) Thanks Yuval! – Samar Sep 16 '16 at 10:48
  • @Jatin Am I correct that `this.synchronized` blocks? Is it possible to solve this problem without blocking? Note that I would be open to a redesign of `MyContainer` if it is possible to avoid blocking. – davidrpugh Sep 16 '16 at 17:27
  • @davidrpugh You could avoid some blocking by using https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html. You could have read locks at `filter` and write lock in remove and add. You could instead have a spin based lock instead of blocking. But in that case it would keep spinning till it finds access to lock – Jatin Sep 16 '16 at 18:29
  • @Jatin I understand the need for synchronization or locks for add/remove methods, but I am not sure that I understand the need for sychronizing or locks on the filter method given that `backingStore` is immutable and its reference is volatile. Can you explain? – davidrpugh Sep 17 '16 at 11:31
  • You need them for atomicity. If a thread calls one method and other thread calls another, then for sanity of data you would need synchronization. – Jatin Sep 18 '16 at 07:04