how to read a multiline nested json in spark scala

Question

I have a json file as below,

[
 {
  "WHO": "Joe",
  "WEEK": [
    {
      "NUMBER": 3,
      "EXPENSE": [
        {
          "WHAT": "BEER",
          "AMOUNT": 18.00
        },
        {
          "WHAT": "Food",
          "AMOUNT": 12.00
        },
        {
          "WHAT": "Food",
          "AMOUNT": 19.00
        },
        {
          "WHAT": "Car",
          "AMOUNT": 20.00
        }
      ]
    }
  ]
 }
]

I executed the below set of code,

import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val jsonRDD = sc.wholeTextFiles("/test.json").map(x => x._2)
val jason = sqlContext.read.json(jsonRDD)
jason.show

Output:

It shows WrappedArray in the output. How can we explode the data?

What is the output that you are expecting or want? – Nikunj Kakadiya Feb 06 '21 at 04:44 — Nikunj Kakadiya, Feb 06 '21 at 04:44

score 1 · Accepted Answer · answered Feb 06 '21 at 05:41

You don't need to read it as wholetextfiles you can just read it as json directly. You just need to specify an option of multiline equal to true to make it work.

val df = spark.read.option("multiLine", true).json("/test.json")

You can see the output as below :

Now to further explode the array columns you can use selectExpr to see each elemet of array as a column as below :

val df1 = df.selectExpr("WHO","Week.Expense[0].amount as Amount","Week.Expense[0].What as What","WEEK.Number as Number")

You can see the output of these as below :

You can also use the combination of select plus explode to do the same thing as below :

val df2 = df.select($"WHO",explode($"Week").as("c1")).select("WHO","c1.Expense","c1.Number","c1.Expense.amount","c1.Expense.what").drop("Expense")

You can see the output as below :

how to read a multiline nested json in spark scala

1 Answers1