1

I have a String with many records in JSON Format. I have to convert each JSON record to one-line JSON record.

Example: Input:

{
  "field1" : "aa11",
  "field2" : "aa22",
  "structField" : {
    "sf1" : "aaa11",
    "sf2" : "aaa22"
  }
}, {
  "field1" : "bb11",
  "field2" : "bb22",
  "structField" : {
    "sf1" : "bbb11",
    "sf2" : "bbb22"
  }
}, {
  "field1" : "cc11",
  "field2" : "cc22",
  "structField" : {
    "sf1" : "ccc11",
    "sf2" : "ccc22"
  }
}

Output:

{"field1":"aa11","field2":"aa22", "structField":{"sf1" : "aaa11","sf2" : "aaa22"}},
{"field1":"bb11","field2":"bb22","structField":{"sf1" : "bbb11","sf2" : "bbb22"}}, 
{"field1" : "cc11","field2" : "cc22","structField" : {"sf1" : "ccc11","sf2" : "ccc22"}}

I am using Scala to try to parse the String and split it by "}, {" and reformat my JSON:

myMultiJSONString.
  substring(2,myMultiJSONString.length-2).
  split("\\}, \\{").
  map(reg => "{" + reg.trim.replaceAll("\\n","") + "}")

I think this is a dirty way.

Is there some library which can help with this stuff?

For example, deserializing JSON String to "something" and serializing later in one-line JSON String.

Any idea?

Thanks!

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
icuefue
  • 28
  • 6
  • One option is to use apache spark and read the multiline json and write in json which will be equivalent to your output. But only for this small task using Spark doesn't make sense. – koiralo May 07 '18 at 15:19

2 Answers2

0

If the input JSON is not too huge, one of possible approaches to achieve that without using "dirty" techniques is to use a JSON parsing library to parse the input data and output it line by line with disabled "pretty print" feature.

The structure of the input data does not matter, this can be done almost directly.

For example, using Json4s:

// since the input is not wrapped as JSON array, we need to wrap it to parse properly
val wrappedAsJsonArray = new StringBuilder("[").append(json).append("]").toString()

val parsed = parse(wrappedAsJsonArray)

implicit val formats = DefaultFormats

parsed.children.foreach(obj => {
  val oneLineJson = write(obj) + ","
  println(oneLineJson) // or write to output file
})

// the output:
{"field1":"aa11","field2":"aa22","structField":{"sf1":"aaa11","sf2":"aaa22"}},
{"field1":"bb11","field2":"bb22","structField":{"sf1":"bbb11","sf2":"bbb22"}},
{"field1":"cc11","field2":"cc22","structField":{"sf1":"ccc11","sf2":"ccc22"}},
Antot
  • 3,904
  • 1
  • 21
  • 27
  • Thank you! It is a simple a easy solution ant i use it in my use case. – icuefue May 08 '18 at 07:52
  • For other users, imports needed: `import org.json4s._ import org.json4s.jackson.JsonMethods._ import org.json4s.jackson.Serialization.write` – icuefue May 08 '18 at 08:37
0

It is always better to use proper json api if that fits in your use case. There are tons of json apis - What JSON library to use in Scala?

I would say you can go with circe which is a functional scala json api. They have pretty good documentation - https://circe.github.io/circe/parsing.html

Example,

import io.circe._, io.circe.parser._

object CirceAgainSerialisers {

  def main(args: Array[String]): Unit = {

    val rawFakeJson: String =
      """
        |  {
        |    "field1": "aa11",
        |    "field2": "aa22",
        |    "structField": {
        |      "sf1": "aaa11",
        |      "sf2": "aaa22"
        |    }
        |  },
        |  {
        |    "field1": "bb11",
        |    "field2": "bb22",
        |    "structField": {
        |      "sf1": "bbb11",
        |      "sf2": "bbb22"
        |    }
        |  },
        |  {
        |    "field1": "cc11",
        |    "field2": "cc22",
        |    "structField": {
        |      "sf1": "ccc11",
        |      "sf2": "ccc22"
        |    }
        |  }
      """.stripMargin

    val deserialised: Either[ParsingFailure, Json] = parse(s"[$rawFakeJson]")

    val fakeSerialise = deserialised.map(json => json.asArray.getOrElse(Vector.empty).mkString(","))

    fakeSerialise match {
      case Right(json) => println(json)
      case Left(failed) => println(failed)
    }
  }
}

your build.sbt would look like,

name := "serialisers-deserialisers"

version := "0.1"

scalaVersion := "2.12.2"

val circeVersion = "0.9.3"

libraryDependencies ++= Seq(
  "io.circe" %% "circe-core",
  "io.circe" %% "circe-generic",
  "io.circe" %% "circe-parser"
).map(_ % circeVersion)
prayagupa
  • 30,204
  • 14
  • 155
  • 192
  • Thank you! I test it and it works perfectly but in my actual use case, im going to use Antot solution. I will read about circe api because i have to do complex json operations in future. Thank you! – icuefue May 08 '18 at 07:55