7

I'm running the following Scala code:

import scala.util.parsing.json._
import scala.io._

object Main {
        def jsonStringMap(str: String) =
                JSON.parseFull(str) match {
                        case Some(m: Map[_,_]) => m collect {
                                        // If this doesn't match, we'll just ignore the value
                                        case (k: String, v: String) => (k,v)
                                } toMap
                        case _ => Map[String,String]()
                }

        def main(args: Array[String]) {
                val fh = Source.fromFile("listings.txt")
                try {
                        fh.getLines map(jsonStringMap) foreach { v => println(v) }
                } finally {
                        fh.close
                }
        }
}

On my machine it takes ~3 minutes on the file from http://sortable.com/blog/coding-challenge/. Equivalent Haskell and Ruby programs I wrote take under 4 seconds. What am I doing wrong?

I tried the same code without the map(jsonStringMap) and it was plenty fast, so is the JSON parser just really slow?

It does seem likely that the default JSON parser is just really slow, however I tried https://github.com/stevej/scala-json and while that gets it down to 35 seconds, that's still much slower than Ruby.

I am now using https://github.com/codahale/jerkson which is even faster! My program now runs in only 6 seconds on my data, only 3 seconds slower than Ruby, which is probably just the JVM starting up.

singpolyma
  • 10,999
  • 5
  • 47
  • 71
  • maybe a better fit for codereview.stackexchange.com – Nettogrof Feb 23 '12 at 02:28
  • Offhand, it seems like you are parsing each line independently. Have you tried invoking the parser once for the whole JSON document? – Chris Shain Feb 23 '12 at 02:48
  • @ChrisShain I could turn the whole file into a JSON document, but (a) I don't see how that would be faster, beacuse it can't stream the lines in from the file even, but would have to do it all at once (b) why would doing the same thing as this is doing in Ruby be so much faster? – singpolyma Feb 23 '12 at 02:53
  • 1
    I'm pretty sure the answer is just, "Nobody bothered to write a fast JSON parser." Parser combinators, of which the JSON parser is one, are for _ease of creation_, not performance. If you want speed, you'd be better off with a Java JSON library. – Rex Kerr Feb 23 '12 at 03:44
  • 3
    The lift json must be fast, as referenced in the answers of this question http://stackoverflow.com/questions/927983/how-can-i-construct-and-parse-a-json-string-in-scala-lift – Phil Feb 23 '12 at 05:18
  • @Phil do you know where I can get binaries for the lift JSON library? It wouldn't build out of the box on my system... – singpolyma Feb 23 '12 at 07:18
  • You could bring in the whole liftweb framework as here: http://liftweb.net/ or you could use maven http://www.assembla.com/wiki/show/liftweb/Using_Maven or you could try to get the jar from here http://www.jarvana.com/jarvana/browse/net/liftweb/, you need to use the right JAR for the right version of Scala you are using. I use SBT to build, not Maven, which works well but complex to setup – Phil Feb 23 '12 at 11:43
  • 1
    Well, it would easier to answer if we could see the Ruby/Haskell programs. – Daniel C. Sobral Feb 23 '12 at 14:31

3 Answers3

8

A quick look at the scala-user archive seems to indicate that nobody is doing serious work with the JSON parser in the scala standard library.

See http://groups.google.com/group/scala-user/msg/fba208f2d3c08936

It seems the parser ended up in the standard library at a time when scala was less in the spotlight and didn't have the expectations it has today.

huynhjl
  • 41,520
  • 14
  • 105
  • 158
3

Use Jerkson. Jerkson uses Jackson which is always the fastest JSON library on the JVM (especially when stream reading/writing) large documents.

Steve
  • 249
  • 1
  • 3
2

Using my JSON library, I get an almost instantaneous parse of both files:

import com.github.seanparsons.jsonar._
import scala.io.Source
def parseLines[T](file: String, transform: (Iterator[String]) => T): T = {
  val log = Source.fromFile(file)
  val logLines = log.getLines()
  try { transform(logLines) } finally { log.close }
}
def parseFile(file: String) = parseLines(file, (iterator) => iterator.map(Parser.parse(_)).toList)
parseFile("products.txt"); parseFile("listings.txt")

However, as someone mentioned, it would be more useful to just parse the whole thing as a JSONArray rather than have lots of individual lines as this does.

Sean Parsons
  • 2,832
  • 21
  • 17