0

I have a json file as below,

[
 {
  "WHO": "Joe",
  "WEEK": [
    {
      "NUMBER": 3,
      "EXPENSE": [
        {
          "WHAT": "BEER",
          "AMOUNT": 18.00
        },
        {
          "WHAT": "Food",
          "AMOUNT": 12.00
        },
        {
          "WHAT": "Food",
          "AMOUNT": 19.00
        },
        {
          "WHAT": "Car",
          "AMOUNT": 20.00
        }
      ]
    }
  ]
 }
]

I executed the below set of code,

import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val jsonRDD = sc.wholeTextFiles("/test.json").map(x => x._2)
val jason = sqlContext.read.json(jsonRDD)
jason.show

Output:

multiline nested json spark

It shows WrappedArray in the output. How can we explode the data?

Vivarsh
  • 117
  • 8

1 Answers1

1

You don't need to read it as wholetextfiles you can just read it as json directly. You just need to specify an option of multiline equal to true to make it work.

val df = spark.read.option("multiLine", true).json("/test.json")

You can see the output as below : enter image description here

Now to further explode the array columns you can use selectExpr to see each elemet of array as a column as below :

val df1 = df.selectExpr("WHO","Week.Expense[0].amount as Amount","Week.Expense[0].What as What","WEEK.Number as Number")

You can see the output of these as below :

enter image description here

You can also use the combination of select plus explode to do the same thing as below :

val df2 = df.select($"WHO",explode($"Week").as("c1")).select("WHO","c1.Expense","c1.Number","c1.Expense.amount","c1.Expense.what").drop("Expense")

You can see the output as below :

enter image description here

Nikunj Kakadiya
  • 2,689
  • 2
  • 20
  • 35