I am using apache spark to parse json files . How to get nested key from json files wheather it's array or nested key

Question

I have multiple json files which keeps json data init. Json Structure look like this.

{ 
  "Name":"Vipin Suman",
  "Email":"vpn2330@gmail.com",
 "Designation":"Trainee Programmer",
 "Age":22 ,
 "location":
    {"City":
           {
            "Pin":324009,
            "City Name":"Ahmedabad"
           },
    "State":"Gujarat"
   },
 "Company":
          {
           "Company Name":"Elegant",
           "Domain":"Java"
          }, 
 "Test":["Test1","Test2"]

}

I tried this

    String jsonFilePath = "/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-03.json";

    String[] jsonFiles = jsonFilePath.split(",");

    Dataset<Row> people = sparkSession.read().json(jsonFiles);

i am getting schema for this is

root
 |-- Age: long (nullable = true)
 |-- Company: struct (nullable = true)
 |    |-- Company Name: string (nullable = true)
 |    |-- Domain: string (nullable = true)
 |-- Designation: string (nullable = true)
 |-- Email: string (nullable = true)
 |-- Name: string (nullable = true)
 |-- Test: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- location: struct (nullable = true)
 |    |-- City: struct (nullable = true)
 |    |    |-- City Name: string (nullable = true)
 |    |    |-- Pin: long (nullable = true)
 |    |-- State: string (nullable = true)

i am getting the view of table:-

    +---+--------------+------------------+-----------------+-----------+--------------+--------------------+
|Age|       Company|       Designation|            Email|       Name|          Test|            location|
+---+--------------+------------------+-----------------+-----------+--------------+--------------------+
| 22|[Elegant,Java]|Trainee Programmer|vpn2330@gmail.com|Vipin Suman|[Test1, Test2]|[[Ahmedabad,32400...|
+---+--------------+------------------+-----------------+-----------+--------------+--------------------+

i want result as:-

            Age   |  Company Name    | Domain|  Designation |  Email           |    Name          |  Test   | City Name      |  Pin   |   State   |

           22     | Elegant MicroWeb | Java  |  Programmer  | vpn2330@gmail.com | Vipin Suman     | Test1  |  Ahmedabad      | 324009  | Gujarat 
           22     | Elegant MicroWeb | Java  |  Programmer  | vpn2330@gmail.com | Vipin Suman     | Test2  |  Ahmedabad      | 324009  |

how i can get table in above formet. i tried out everything. I am new to apache spark can any one help me out??

score 0 · Answer 1 · answered Jul 31 '17 at 02:36

I suggest you do your work in scala which is better supported by spark. To do your work, you can use "select" API to select a specific column, use alias to rename a column, and you can refer to here to say how to select complex data format(https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html)

Based on your result, you also need to use "explode" API (Flattening Rows in Spark)

score 0 · Answer 2 · answered Aug 01 '17 at 15:40

In Scala it could be done like this:

people.select(
  $"Age",
  $"Company.*",
  $"Designation",
  $"Email",
  $"Name",
  explode($"Test"),
  $"location.City.*",
  $"location.State")

Unfortunately, following code in Java would fail:

people.select(
  people.col("Age"),
  people.col("Company.*"),
  people.col("Designation"),
  people.col("Email"),
  people.col("Name"),
  explode(people.col("Test")),
  people.col("location.City.*"),
  people.col("location.State"));

You can use selectExpr instead though:

people.selectExpr(
  "Age",
  "Company.*",
  "Designation",
  "Email",
  "Name",
  "EXPLODE(Test) AS Test",
  "location.City.*",
  "location.State");

PS: You can pass the path to the directory or directories instead of the list of JSON files in sparkSession.read().json(jsonFiles);.

I am using apache spark to parse json files . How to get nested key from json files wheather it's array or nested key

2 Answers2