I'm running the following Scala code:
import scala.util.parsing.json._
import scala.io._
object Main {
def jsonStringMap(str: String) =
JSON.parseFull(str) match {
case Some(m: Map[_,_]) => m collect {
// If this doesn't match, we'll just ignore the value
case (k: String, v: String) => (k,v)
} toMap
case _ => Map[String,String]()
}
def main(args: Array[String]) {
val fh = Source.fromFile("listings.txt")
try {
fh.getLines map(jsonStringMap) foreach { v => println(v) }
} finally {
fh.close
}
}
}
On my machine it takes ~3 minutes on the file from http://sortable.com/blog/coding-challenge/. Equivalent Haskell and Ruby programs I wrote take under 4 seconds. What am I doing wrong?
I tried the same code without the map(jsonStringMap) and it was plenty fast, so is the JSON parser just really slow?
It does seem likely that the default JSON parser is just really slow, however I tried https://github.com/stevej/scala-json and while that gets it down to 35 seconds, that's still much slower than Ruby.
I am now using https://github.com/codahale/jerkson which is even faster! My program now runs in only 6 seconds on my data, only 3 seconds slower than Ruby, which is probably just the JVM starting up.