I am new to PySpark.I am trying to read the values for one of the nested column of my JSON data.Here is my json structure:
-- _index: string (nullable = true)
|-- _score: string (nullable = true)
|-- _source: struct (nullable = true)
| |-- layers: struct (nullable = true)
| | |-- R1.TEST6: struct (nullable = true)
| | | |-- R1.TEST1: struct (nullable = true)
| | | | |-- R1.TEST1.idx: string (nullable = true)
| | | | |-- R1.TEST1.ide: string (nullable = true)
| | | |-- R1.TEST3: struct (nullable = true)
| | | | |-- R1.TEST3.PDU: string (nullable = true)
| | | | |-- R1.TEST3.pdu: string (nullable = true)
| | | | |-- R1.TEST4: struct (nullable = true)
| | | | | |-- R1.TEST2: struct (nullable = true)
| | | | | | |-- R1.TEST2.agg: string (nullable = true)
| | | | | | |-- R1.TEST2.size: string (nullable = true)
| | | | | | |-- R1.TEST2.start: string (nullable = true)
| | | | | | |-- R1.TEST2.beam: string (nullable = true)
| | | | | | |-- R1.TEST2.startIndex: string (nullable = true)
| | | | | | |-- R1.TEST2.regType: string (nullable = true)
| | | | | | |-- R1.TEST2.coreSetType: string (nullable = true)
| | | | | | |-- R1.TEST2.cpType: string (nullable = true)
| | | | | | |-- R1.TEST2.column1: string (nullable = true)
| | | | | | |-- R1.TEST2.column1: string (nullable = true)
| | | | | | |-- R1.TEST2.column1: string (nullable = true)
| | | | | | |-- R1.TEST2.column1: string (nullable = true)
| | | | | | |-- R1.TEST2.column1: string (nullable = true)
| | | | | | |-- R1.TEST2.column1: string (nullable = true)
| | | | | | |-- R1.TEST2.column3: string (nullable = true)
As mentioned over the article,https://stackoverflow.com/questions/57811415/reading-a-nested-json-file-in-pyspark,I tried doing below:
df2 = df.select(F.array(F.expr("_source.*")).alias("Source"))
Now my requirement is to access the value that is underR1.TEST6: tag
But below code is not working:
df2.withColumn("source_data", F.explode(F.arrays_zip("Source"))).select("source_data.Source.R1.TEST6.R1.TEST1.idx").show()
Can someone please help me on how can I access all the fields of this nested JSON and create a table as there are multiple levels of nesting present in this JSON _source.R1.TEST6 So how to use explode at this many multiple levels under