Relational data using hash tables

Question

I am trying to keep in memory a set of data structures identified by the following schema. I have a crude solution, but I'm looking for some better ideas. Performance and reliability is critical important, memory not so much (within reason), since the tables are fairly small (maximum a couple of hundred entries, more likely a few dozens). I would rather not use an in-memory DB for such a small data set, I think.

What I'm trying to accomplish is the ability to quickly query All B entries based on A.Name, all A entries based on B.Name, or all A entries based on T.Tag (I don't really need all B entries based on T.Tag currently, but could be useful in the future)

Currently I use three tables with duplicate data, and with the synchronization issues that that brings, and when I have a new piece of data, I store it three different ways. I am sure there has to be a better way.

// all A entries matching a tag
val Tag2A= new MutableHashMap[String, MutableSet[String]]() with MutableMultiMap[String, String] 

// all B entries matching a tag
val Tag2B = new MutableHashMap[String, MutableSet[List[Int]]]() with MutableMultiMap[String, List[Int]]

// all Tags matching a A entry
val A2Tag = new MutableHashMap[String, MutableSet[String]]() with MutableMultiMap[String, String]

Could someone recommend a more elegant solution?

EDIT: (clarification) My MutableMultiMap and MutableSet are just mutable.MultiMap and mutable.Set aliased at import time.

EDIT2: the tables need to be modifiable (add/remove).

score 1 · Accepted Answer · edited May 23 '17 at 12:06

Assuming you can load everything into a memory, immutable solution is acceptable:

 abstract class A2B(tag: String) {
   def aMap: Map[String, A]
   def bMap: Map[String, B]
 }

 case class A(id: String, name: String, tag: A2B, payload: String)

 case class B(id: String, name: String, tag: A2B, payload: List[Int])

You can initialize it like that (to solve chicken-egg problem):

def getA2b(name: String): A2B = new A2B(name) {
    val aMap = { //you can query your external data source tableA here
        val a1 = "a1" -> A("a1", "a1", this, "aaaa")
        val a2 = "a2" -> A("a2", "a2", this, "aaaa")
        Map(a1, a2)
    }

    val bMap = { //you can query your external data source tableB here
        val b1 = "b1" -> B("b1", "b1", this, Nil)
        val b2 = "b2" -> B("b2", "b2", this, Nil)
        Map(b1, b2)
    } 

    override def toString = name
}


val a2b = Map("a2b1" -> getA2b("a2b1")) //you can query your external data source tableA2B here

And query with constant access time (sorry have no Scala REPL on current machine yet):

println(a2b("a2b1"))
println(a2b("a2b1").aMap)
println(a2b("a2b1").aMap("a1").tag.bMap)

a2b1                                                                                                                                                                                                                                                   
Map(a1 -> A(a1,a1,a2b1,aaaa), a2 -> A(a2,a2,a2b1,aaaa))                                                                                                                                                                                                
Map(b1 -> B(b1,b1,a2b1,List()), b2 -> B(b2,b2,a2b1,List()))

All relations here are modeled with links, so no overhead. The structure is immutable, so it's thread-safe. You can also notice that A2B class gets initialized inside constructor (and all vals are final by default), so no synchronization problems according to JSR-133 - your application always see final version of A2B, so no need for volatiles.

Thanks, this helps. The only limitation I see is that my data is coming from a stream, so I won't be able to load everything into memory immediately. So the dataset will be modified (add/remove) over the life of the stream. I realize I didn't include that in my question, and I apologize. I am trying to modify your answer to support this, but my chicken is fighting my egg. — Will I Am, Oct 03 '15 at 03:13
@WillIAm If your dataset is really small and you don't care about datasource overhead - you can just reload everything (a2b) if something had changed (or even not everything - just add delta to your new a2b based on previous one - make sure that it is atomic, no concurrent update on a2b then). As long as your data is read-only and eventually consistent - you won't have any concurrency problems with "var a2b = ". In case of strong consistency you may use Actor (or agent) or pass immutable a2b implicitly into all functions (they will pick up new data on next message if it's ok for you). — dk14, Oct 03 '15 at 03:25
Thanks. I need to figure out how to do integrate this with my actor (I'm inside an actor, and my changes are coming as actor tells), but it looks like what I need. What would be the most efficient way to modify these maps from an actor Receive block? — Will I Am, Oct 04 '15 at 04:57

Relational data using hash tables

1 Answers1