I would like to know an efficient approach here. Lets say we have a JSON data as follows,
root
|-- fields: struct (nullable = true)
| |-- custid: string (nullable = true)
| |-- password: string (nullable = true)
| |-- role: string (nullable = true)
I can read this into data frame using,
jsonData_1.withColumn("custid", col("fields.custid")).withColumn("password", col("fields.password")).withColumn("role", col("fields.role"))
But if we have 100s of nested columns or if the cols are prone to change overtime or has more nested cols, I feel its not a good decision to use this approach. Is there any way we can make a code automatically look for all the columns and sub-cols and make a dataframe by reading the input JSON file? or is this the only good approach? Please share me your ideas here.