1

I've been asked to parse a JSON file to get all the buses that are over a specified speed inputed by the user.

The JSON file can be downloaded here

It's like this:

{
"COLUMNS": [
    "DATAHORA",
    "ORDEM",
    "LINHA",
    "LATITUDE",
    "LONGITUDE",
    "VELOCIDADE"
],
"DATA": [
    [
        "04-16-2015 00:00:55",
        "B63099",
        "",
        -22.7931,
        -43.2943,
        0
    ],
    [
        "04-16-2015 00:01:02",
        "C44503",
        781,
        -22.853649,
        -43.37616,
        25
    ],
    [
        "04-16-2015 00:11:40",
        "B63067",
        "",
        -22.7925,
        -43.2945,
        0
    ],
]
}

The thing is: I'm really new to scala and I have never worked with json before (shame on me). What I need is to get the "Ordem", "Linha" and "Velocidade" from DATA node.

I created a case class to enclousure all the data so as to later look for those who are over the specified speed.

case class Bus(ordem: String, linha: Int, velocidade: Int)

I did this reading the file as a textFile and spliting. Although this way, I need to foreknow the content of the file in order to go to the lines after DATA node.

I want to know how to do this using a JSON parser. I've tried many solutions, but I couldn't adapt to my problem, because I need to extract all the lines from DATA node instead of nodes inside one node.

Can anyone help me?

PS: Sorry for my english, not a native speaker.

PhaSath
  • 166
  • 1
  • 12

3 Answers3

3

First of all, you need to understand the different JSON data types. The basic types in JSON are numbers, strings, booleans, arrays, and objects. The data returned in your example is an object with two keys: COLUMNS and DATA. The COLUMNS key has a value that is an array of strings and numbers. The DATA key has a value which is an array of arrays of strings.

You can use a library like PlayJSON to work with this type of data:

val js = Json.parse(x).as[JsObject]
val keys = (js \ "COLUMNS").as[List[String]]
val values = (js \ "DATA").as[List[List[JsValue]]]

val busses = values.map(valueList => {
    val keyValues = (keys zip valueList).toMap 
    for {
      ordem <- keyValues("ORDEM").asOpt[String]
      linha <- keyValues("LINHA").asOpt[Int]
      velocidade <- keyValues("VELOCIDADE").asOpt[Int]
    } yield Bus(ordem, linha, velocidade)
})

Note the use of asOpt when converting the properties to the expected types. This operator converts the key-values to the provided type if possible (wrapped in Some), and returns None otherwise. So, if you want to provide a default value instead of ignoring other results, you could use keyValues("LINHA").asOpt[Int].getOrElse(0), for example.

You can read more about the Play JSON methods used here, like \ and as, and asOpt in their docs.

Ben Reich
  • 16,222
  • 2
  • 38
  • 59
  • Hey, I liked your solution. But I'm having a problem: PlayJSON is a framework, right? I've been working with Intellij Community Edition, and it doesn't work with framework to the furthest of my knowledge. Is there any other solution or did I let something pass? – PhaSath Apr 23 '15 at 23:31
  • 1
    You can definitely use `PlayJSON` in IntelliJ Community Edition! I've done it many times. It's especially easy if you set up an SBT project: https://confluence.jetbrains.com/display/IntelliJIDEA/Getting+Started+with+SBT and then add the PlayJSON SBT dependency: http://stackoverflow.com/a/19438083/1223622 – Ben Reich Apr 23 '15 at 23:35
  • Okay, I've downloaded Play, installed and runned the activator. I've also created a project on the activator UI. Now, based on what @Ben Reich said, I'm trying to add PlayJSON. My project is actually a sbt project, and I've already tried `name := "JsonParser" version := "1.0" resolvers += "Typesafe Repo" at "http://repo.typesafe.com/typesafe/releases/" scalaVersion := "2.11.6" addSbtPlugin("play" % "sbt-plugin" % "2.3.4") libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.1" libraryDependencies += "com.typesafe.play" %% "play-json" % "2.3.4" ` – PhaSath Apr 23 '15 at 23:52
  • But it says on `import play.api.libs.json` that cannot resolve play symbol nor Json.parse(x).as[JsObject]. – PhaSath Apr 23 '15 at 23:54
  • 1
    @PhaSath I think this is a new question at this point! Make sure that you have the SBT plugin for IntelliJ: http://blog.jetbrains.com/scala/2013/11/18/built-in-sbt-support-in-intellij-idea-13/ and that you're putting the dependencies in the correct `build.sbt` file: http://www.scala-sbt.org/0.13/tutorial/Basic-Def.html. If that doesn't work, maybe consider opening a new question. – Ben Reich Apr 24 '15 at 00:18
  • For those who found the same problem as me, I found the sollution: The build.sbt should have: ` resolvers += "Typesafe Repo" at "http://repo.typesafe.com/typesafe/releases/" libraryDependencies += "com.typesafe.play" %% "play" % "2.3.8"` Looks like I "play-json" doesn't work. Now it's fully working. Thanks to everyone. – PhaSath Apr 24 '15 at 16:15
  • Now, I'm facing another problem: whenever "Linha" is "" instead of a Int, it yields a None. Do you know any way to get passed this? I've tried to convert to String, to Int, to check whether it is "" or a num, but nothing worked well. I've also tried to put a constructor to the case class, but didn't work. – PhaSath Apr 24 '15 at 17:34
  • 1
    I've added a clarifying edit, but feel free to open a new question as opposed to continuing the discussion here. – Ben Reich Apr 24 '15 at 18:04
1

You can use Spark SQL to achieve it. Refer section under JSON Datasets here

In essence, Use spark APIs to load a JSON and register it as temp table. You can run your SQL queries on the table from there.

urug
  • 405
  • 7
  • 18
0

As seen on @Ben Reich answer, that code works great. Thank you very much.

Although, my Json had some type problems on "Linha". As it can be seen on the JSON example that I put on the Question, there are "" and also numbers, e.g., 781.

When trying to do keyValues("LINHA").asOpt[Int].getOrElse(0), it was producing an error saying that value flatMap is not a member of Int. So, I had to change some things:

case class BusContainer(ordem: String, linha: String, velocidade: Int)
val jsonString = fromFile("./project/rj_onibus_gps.json").getLines.mkString
val js = Json.parse(jsonString).as[JsObject]
val keys = (js \ "COLUMNS").as[List[String]]
val values = (js \ "DATA").as[List[List[JsValue]]]

val buses = values.map(valueList => {
  val keyValues = (keys zip valueList).toMap
  println(keyValues("ORDEM"),keyValues("LINHA"),keyValues("VELOCIDADE"))
  for {
    ordem <- keyValues("ORDEM").asOpt[String]
    linha <- keyValues("LINHA").asOpt[Int].orElse(keyValues("LINHA").asOpt[String])
    velocidade <- keyValues("VELOCIDADE").asOpt[Int]
  } yield BusContainer(ordem, linha.toString, velocidade)
})

Thanks for the help!

PhaSath
  • 166
  • 1
  • 12