4

I have the code of Scala

class MyClass {
  private val myData: Map[String, MyClass2] = new HashMap[String, MyClass2]()

  def someMethod = {
      synchronized(myData) {
        val id = getSomeId
        if (myData.containsKey(id)) myData = myData.remove(id)
        else Log.d("123", "Not found!")
      }
  }

  def getSomeId = //....
}

I wonder, is it possible to remain this code thread-safe without using synchronized and without involving some other libraries such Akka or any other libraries (classes) even built-in in Java or Scala?

Ideally, I'd like to make thread-safe only by using the conception of immutability (final in Java, if you will).

UPDATE:

class MyClass(myData: Map[String, MyClass2] = new HashMap[String, MyClass2]()) {

  def someMethod = {
      synchronized(myData) {
        val id = getSomeId
        if (myData.containsKey(id)) new MyClass(myData.remove(id))
        else {
           Log.d("123", "Not found!")
           this
         }
      }
  }

  def getSomeId = //....
}
Petr
  • 62,528
  • 13
  • 153
  • 317
Alan Coromano
  • 24,958
  • 53
  • 135
  • 205
  • Your code doesn't compile. Are you using an immutable map? Then `myData` must be a `var`. Anyway, the answer is "no". You need either a lock or use any other concurrent tool like atomic CAS. – 0__ Jun 15 '13 at 13:09
  • @0__ look at my update, now at least it should compile. – Alan Coromano Jun 15 '13 at 13:26
  • You want to use path-copying there (make `MyClass` an immutable case class itself), as shown in the answer by Petr Pudlák. Then you move the synchronization responsibility to the instance that uses `MyClass` (and you can ask the question again :) – 0__ Jun 15 '13 at 15:17
  • @0__ hmm, you're right... if I understood you correctly. – Alan Coromano Jun 15 '13 at 15:38
  • @0__ would you mind giving me an example of what you said (`move the synchronization responsibility to the instance that uses MyClass`)? I'm not sure I understand the issue. – Alan Coromano Jun 16 '13 at 01:28
  • Say you use path copying, e.g. `case class Foo(map: Map[String, Bar])`. This is thread safe because fully immutable. But you will track changes on the use site: `var foo = Foo(Map.empty); foo = foo.copy(map = foo.map + ("bar" -> new Bar {}))`; so you need to think about concurrency when changing the `foo` variable, and so forth. – 0__ Jun 16 '13 at 09:08
  • @0__ can you write this as an answer? – Alan Coromano Jun 16 '13 at 09:19

4 Answers4

4

You can solve the problem with immutability only if you make MyClass immutable too (and let it use only immutable data structures as well). The reason is simple: If MyClass is mutable, then you have to synchronize modifications by concurrent threads.

This requires a different design - every operation that causes an instance of MyClass to "change" will instead return a (possibly) modified instance.

import collection.immutable._

class MyClass2 {
  // ...
}

// We can make the default constructor private, if we want to manage the
// map ourselves instead of allowing users to pass arbitrary maps
// (depends on your use case):
class MyClass private (val myData: Map[String,MyClass2]) {
  // A public constructor:
  def this() = this(new HashMap[String,MyClass2]())

  def someMethod(id: String): MyClass = {
    if (myData.contains(id))
      new MyClass(myData - id) // create a new, updated instance
    else {
      println("Not found: " + id)
      this // no modification, we can return the current
            // unmodified instance
    }
  }

  // other methods for adding something to the map
  // ...
}
Viktor Klang
  • 26,479
  • 7
  • 51
  • 68
Petr
  • 62,528
  • 13
  • 153
  • 317
  • it seems this is exactly what I was looking for! – Alan Coromano Jun 15 '13 at 15:34
  • what does private mean here -- `class MyClass private`? – Alan Coromano Jun 15 '13 at 15:35
  • 1
    @MariusKavansky It means that the default constructor `MyClass(Map[String,Class2])` is [`private`](http://stackoverflow.com/a/1730622/1333025). So external users of the class must use `new MyClass()`, not `new MyClass(someMyDataMap)`, which can be desirable, if we want complete control of the content of the map. – Petr Jun 15 '13 at 19:21
2

If you use a TrieMap from scala 2.10, which is a lock free and concurrent map implementation, you can avoid synchronizing:

import scala.collection.concurrent.TrieMap

class MyClass2

class MyClass {
    private val myData = TrieMap[String, MyClass2]()
    def someMethod = myData -= getSomeId
    def getSomeId = "id"
}
gnf
  • 483
  • 3
  • 8
  • But he has two subsequent calls, `containsKey` and `remove`. So if you want to have that thread safe without locking / CAS, you must use immutable structures up to and including `MyClass`. If you add `containsKey` here, you still have a race condition. – 0__ Jun 15 '13 at 15:14
1

I would recommend using a library, because getting concurrency right on your own is hard. You could for example use a concurrent map like TrieMap. See the answer above.

But let's assume that you want to do this manually for educational purposes. The first step to make the above thread-safe would be to use an immutable collection. So instead of

private val myData: Map[String, MyClass2] = new HashMap[String, MyClass2]()

you would use

private var myData = Map.empty[String, MyClass2] 

(Even though there is a var here, this has less mutable state than the version above. In this case the only mutable thing is a single reference, whereas in the example above the entire collection is mutable)

Now you have to deal with the var. You have to make sure that an update to the var on one thread is "seen" on all other threads. So you have to mark the field as @volatile. That would be enough if you have a publish/subscribe scenario where writes are only done from one thread. But assuming that you want to both read and write from different threads, you will need to use synchronized for all write accesses.

Clearly, this is enough complexity to warrant introducing a little helper class:

final class SyncronizedRef[T](initial:T) {
  @volatile private var current = initial

  def get:T = current

  def update(f:T=>T) {
    this synchronized {
      current = f(current)
    }
  }
}

With this little helper, the code from above can be implemented like this:

class MyClass {
  val state = new SyncronizedRef(Map.empty[String, MyClass2])

  def someMethod = {
    state.update(myData =>
      val id = getSomeId
      if (myData.containsKey(id)) 
        myData - id
      else { 
        Log.d("123", "Not found!")
        myData
      }
  }

  def getSomeId = //....
}

This would be thread-safe as far as the map is concerned. However, whether the whole thing is threadsafe depends on whatever happens in getSomeID.

In general, this way of dealing with concurrency will work as long as the thing passed to update is a pure function that just transforms the data without having any side effects. If your state is more complex than a single map, it can be quite challenging to write your updates in a purely functional style.

There are still low level multithreading primitives in the SynchronizedRef, but the logic of your program is completely free of them. You just describe how the state of your program changes in response to external input by composing pure functions.

In any case, the best solution for this particular example is just to use an existing concurrent map implementation.

Rüdiger Klaehn
  • 12,445
  • 3
  • 41
  • 57
  • it seems you have done a lot of work, but nonetheless that's not what I'm looking for -- **I'd like to make thread-safe only by using the conception of immutability** – Alan Coromano Jun 15 '13 at 14:02
  • besides, it involves `synchronized` operator, which is also I'd like to avoid. However, yes, I need an example **made manually for educational purposes**. – Alan Coromano Jun 15 '13 at 14:04
  • There is no way to avoid using low level synchronization primitives somewhere or use libraries that do it for you. You have to tell the CPU that a write needs to be visible on the other CPUs. But note that the actual logic (the transform function) is free from low level synchronization. – Rüdiger Klaehn Jun 15 '13 at 14:08
  • as Peter has shown, it's possible not to use synchronized keyword. Isn't that so? – Alan Coromano Jun 15 '13 at 14:44
1

Whenever you share mutable state, you need a concurrency mechanism. As Rüdiger points out, the best is to identify which kind of concurrency scenario you have, and then use an existing tool that best addresses that scenario:

  • plain old Java synchronized locks
  • atomic compare-and-swap
  • software transactional memory (e.g. Scala-STM)
  • message-passing / actors

Or of course, you could use cooperative multitasking (run concurrent processes on a single shared thread), if you don't need highest possible performance.

If the state of your class is overseeable, you can make that class completely immutable, e.g.:

case class MyClass2()

case class MyClass(myData: Map[String, MyClass2] = Map.empty) {
  def someMethod = {
    val id = getSomeId
    if (myData.contains(id)) copy(myData = myData - id)
    else throw new IllegalArgumentException(s"Key $id not found")
  }

  def getSomeId = "foo" // ...
}

An instance of MyClass is only mutated by copying the data to a new instance, so multiple threads can safely refer to the same instance. But on the other hand, if two threads A and B start off with the same instance foo1, and either of them mutates it and wants that mutation to be seen by the other thread, then you need to share that mutated state again in some form (use an STM ref cell, send a message via an actor, store it in a synchronized variable etc.)

val foo1 = MyClass(Map("foo" -> MyClass2()))
val foo2 = foo1.someMethod  // foo1 is untouched
0__
  • 66,707
  • 21
  • 171
  • 266
  • please explain what you meant by this `Then you move the synchronization responsibility to the instance that uses MyClass (and you can ask the question again :)`. I can't figure it out. With an example if it's not a big deal. – Alan Coromano Jun 16 '13 at 09:35
  • See [this post](http://alvinalexander.com/scala/scala-case-class-copy-method-example) for example. It is just a convenience to create a new instance of a case class where some arguments are replaced. In this case, where there is only one argument (`myData`), it doesn't buy you much. You could equally say `MyClass(myData - id)` – 0__ Jun 16 '13 at 09:36
  • You see, you had `foo1`, now you have `foo2` with the changed content. You could have said `var foo = MyClass(...); foo = foo.someMethod`, replacing the contents of that variable. Now you are going to access `foo` from somewhere in your code. If that may happen from different threads, you need again to ensure that they are playing nicely together if they wish to update `foo`. So you just moved your concurrency concerns to a different spot. – 0__ Jun 16 '13 at 09:39