0

Have data in CSV file below is the format. Want to split JSON from Desc column and create a new column with key.Using spark 2 with Scala.

+------+------------+----------------------------------+
|  id  |  Category  |           Desc                   |
+------+------------+----------------------------------+
|  201 |  MIS20     | { "Total": 200,"Defective": 21 } |
+------+-----------------------------------------------+
|  202 |  MIS30     | { "Total": 740,"Defective": 58 } |
+------+-----------------------------------------------+

Output :

So the desired output would be:

+------+------------+---------+-------------+
|  id  |  Category  |  Total  |  Defective  |
+------+------------+---------+-------------+
|  201 |  MIS20     |  200    |   21        |
+------+----------------------+-------------+
|  202 |  MIS30     |  740    |   58        | 
+------+------------------------------------+

Any help is highly appreciated.

zero323
  • 322,348
  • 103
  • 959
  • 935
Prashant
  • 13
  • 5

1 Answers1

1

Create a schema for your inner json and apply that schema with from_json function as below

val schema = new StructType()
  .add(StructField("Total", LongType, false)).
  add("Defective", LongType, false)

d.select($"id",$"Category", from_json($"Desc", schema).as("desc"))
  .select($"id",$"Category", $"desc.*")
  .show(false)

Output:

+---+--------+-----+---------+
|id |Category|Total|Defective|
+---+--------+-----+---------+
|201|MIS20   |200  |21       |
|202|MIS30   |740  |58       |
+---+--------+-----+---------+

Hope this helps!

koiralo
  • 22,594
  • 6
  • 51
  • 72