0

How to parse and flatten a nested JSON available in a Hive/Hbase column using spark scala?

Example:

A hive table is having a column "c1" with following json

{
    "fruit": "Apple",
    "size": "Large",
    "color": "Red",
    "Lines": [{
            "LineNumber": 1,
            "Text": "ABC"
        },
        {
            "LineNumber": 2,
            "Text": "123"
        }
     ]
}

I want to parse this json and create a dataframe to contain columns and values like this
+------+------+-------+------------+------+
|fruit | size | color | LineNumber | Text |
+------+------+-------+------------+------+
|Apple | Large| Red   | 1          | ABC  |
|Apple | Large| Red   | 2          | 123  |
+------+------+-------+------------+------+

Appreciate any thoughts. Thanks!

Lux
  • 65
  • 1
  • 8

3 Answers3

0

Convert your json to String using mkstring and then use following code

val otherFruitRddRDD = spark.sparkContext.makeRDD( """{"Fruitname":"Jack","fruitDetails":{"fruit":"Apple","size":"Large"}}""" :: Nil)

val otherFruit = spark.read.json(otherFruitRddRDD)

otherFruit.show()

Vivek
  • 405
  • 1
  • 4
  • 14
0
 val df = spark.read.json("example.json")

You can find detail examples on following link

vaquar khan
  • 10,864
  • 5
  • 72
  • 96
0

I think you need a method like this :

df.select(from_json($"c1", schema))

schema will be Struct Type and will contain the structure of the json for you it will be a.Fruit b.size c.color

Subhasish Guha
  • 222
  • 1
  • 6
  • I tried from_json but it's a nested json and "from_json" is increasing complexity when we want to parse each element and create a schema for each element in the json. – Lux Apr 17 '19 at 13:08
  • you can try a jackson complex json parser and use the json column as a String in a UDF and flatten it – Subhasish Guha Apr 18 '19 at 07:41