4

I have a large map that won't fit in memory and I thus want it to live on the disk. I have the following options:

  1. Use a pure Java library like MapDB: This works but I don't get Scala collections sugar like getOrElseUpdate and ++= and the apply/update methods. I could create my own wrapper class around MapDB in Scala but I really don't want to manually implement all the Map traits.
  2. Use something like redis/memcached: I could not find a good scala library that gave me the Map traits for redis or memcached. This may be the better performance solution but it introduces a complexity of running a db

So is there any nice scala only library that implements the Scala collection sugars for maps and yet it falls back on disk and/or a key-value store for large maps?

pathikrit
  • 32,469
  • 37
  • 142
  • 221
  • 1
    It looks like the implementations extend [`java.util.AbstractMap`](http://docs.oracle.com/javase/6/docs/api/java/util/AbstractMap.html) which implements `Map` - it seems like there should be a generic wrapper/bridge around Java's `Map` that could be used. And, if one can't be used, why not? :D –  Feb 09 '13 at 00:20
  • 1
    e.g. see http://stackoverflow.com/questions/1027868/how-to-convert-from-from-java-util-map-to-a-scala-map –  Feb 09 '13 at 00:23
  • The question is generalized: How to access `java.util.Map` as `scala.mutable.Map[K,V]`? I did a search for `java map scala immutable.map` and was rewarded with http://docs.scala-lang.org/overviews/collections/conversions-between-java-and-scala-collections.html –  Feb 09 '13 at 01:37

2 Answers2

7

Answered my own question

import collection.mutable
import org.mapdb.DBMaker
import collection.JavaConversions._

val cache: mutable.Map[String, Seq[String]] = DBMaker.newTempHashMap[String, Seq[String]]()
pathikrit
  • 32,469
  • 37
  • 142
  • 221
  • Shouldn't that be `val cache: collection.concurrent.Map[String, Seq[String]]` to have access to concurrent methods like `putIfAbsent`? – nilskp Dec 17 '13 at 12:35
0

Worth mentioning Chronicle Map: it's basically the same thing as MapDB (disk-persisted ConcurrentMap implementation), but it's much more efficient than MapDB.

Example:

val cache: mutable.Map[String, java.util.List[String]] = ChronicleMap
    .of(classOf[String], classOf[java.util.List])
    .averageKey("key")
    .averageValue(...)
    .valueMarshaller(ListMarshaller.of(
        CharSequenceBytesReader.INSTANCE,
        CharSequenceBytesWriter.INSTANCE
    ))
    .entries(...)
    .createPersistedTo(file)
leventov
  • 14,760
  • 11
  • 69
  • 98