How to parse a nested JSON available in a Hive/Hbase column using spark scala

Question

How to parse and flatten a nested JSON available in a Hive/Hbase column using spark scala?

Example:

A hive table is having a column "c1" with following json

{
    "fruit": "Apple",
    "size": "Large",
    "color": "Red",
    "Lines": [{
            "LineNumber": 1,
            "Text": "ABC"
        },
        {
            "LineNumber": 2,
            "Text": "123"
        }
     ]
}

I want to parse this json and create a dataframe to contain columns and values like this
+------+------+-------+------------+------+
|fruit | size | color | LineNumber | Text |
+------+------+-------+------------+------+
|Apple | Large| Red   | 1          | ABC  |
|Apple | Large| Red   | 2          | 123  |
+------+------+-------+------------+------+

Appreciate any thoughts. Thanks!

score 0 · Answer 1 · answered Apr 17 '19 at 05:06

Convert your json to String using mkstring and then use following code

val otherFruitRddRDD = spark.sparkContext.makeRDD( """{"Fruitname":"Jack","fruitDetails":{"fruit":"Apple","size":"Large"}}""" :: Nil)

val otherFruit = spark.read.json(otherFruitRddRDD)

otherFruit.show()

score 0 · Answer 2 · answered Apr 17 '19 at 05:27

0

 val df = spark.read.json("example.json")

You can find detail examples on following link

https://docs.databricks.com/spark/latest/data-sources/read-json.html

answered Apr 17 '19 at 05:27

vaquar khan

10,864
5
72
96

It's a nested json inside a dataframe column. – Lux Apr 17 '19 at 13:06

score 0 · Answer 3 · answered Apr 17 '19 at 07:33

0

I think you need a method like this :

df.select(from_json($"c1", schema))

schema will be Struct Type and will contain the structure of the json for you it will be a.Fruit b.size c.color

answered Apr 17 '19 at 07:33

Subhasish Guha

222
1
6

I tried from_json but it's a nested json and "from_json" is increasing complexity when we want to parse each element and create a schema for each element in the json. – Lux Apr 17 '19 at 13:08
you can try a jackson complex json parser and use the json column as a String in a UDF and flatten it – Subhasish Guha Apr 18 '19 at 07:41

How to parse a nested JSON available in a Hive/Hbase column using spark scala

3 Answers3