How to convert this nested json in pyspark?

Question

df.printSchema()

root
|-- country: struct (nullable = true)
|    |-- a: long (nullable = true)
|    |-- b: string (nullable = true)
|    |-- c: string (nullable = true)
|    |-- d: string (nullable = true)

Row(trustset=Row(a=1, b='Melbourne is in Aus', c=None, d='Sydney'))

df.show()

+--------------------+
|            trustset|
+--------------------+
|[1, Melbourne is ...|
+--------------------+

My desired output must be

+------+------------------------+-----+--------+
|   a  |    b                   |  c  |   d    |
+------+------------------------+-----+--------+
|   1  |  Melbourne is in Aus   | None| Sydney |
+------+------------------------+-----+--------+

have been getting trustset as the column need to the subs as main columns

df.select(df.col("trustset.*")) see https://stackoverflow.com/questions/38753898/how-to-flatten-a-struct-in-a-spark-dataframe — a.l., Mar 15 '19 at 05:55
Used this `df.select(df.columns("trustset.*"))` got `TypeError: 'list' object is not callable` used this `df.select(df.col("trustset.*"))` got `AttributeError: 'DataFrame' object has no attribute 'col'` — 艾瑪艾瑪艾瑪, Mar 15 '19 at 06:08
There is no such thing as a nested JSON. [JSON](http://json.org) is a text representation of some data structure. And there is no JSON in the question. — axiac, Mar 15 '19 at 06:25

score 1 · Accepted Answer · answered Mar 15 '19 at 06:20

1

DataFrame's select can help you select nested fields in struct

from pyspark.sql.functions import col
df.select(col("trustset.*")).show()

There's a similar question: How to flatten a struct in a Spark dataframe?

answered Mar 15 '19 at 06:20

a.l.

1,085
12
29

How to convert this nested json in pyspark?

1 Answers1