0

df.printSchema()

root
|-- country: struct (nullable = true)
|    |-- a: long (nullable = true)
|    |-- b: string (nullable = true)
|    |-- c: string (nullable = true)
|    |-- d: string (nullable = true)

Row(trustset=Row(a=1, b='Melbourne is in Aus', c=None, d='Sydney'))

df.show()

+--------------------+
|            trustset|
+--------------------+
|[1, Melbourne is ...|
+--------------------+

My desired output must be

+------+------------------------+-----+--------+
|   a  |    b                   |  c  |   d    |
+------+------------------------+-----+--------+
|   1  |  Melbourne is in Aus   | None| Sydney |
+------+------------------------+-----+--------+

have been getting trustset as the column need to the subs as main columns

  • df.select(df.col("trustset.*")) see https://stackoverflow.com/questions/38753898/how-to-flatten-a-struct-in-a-spark-dataframe – a.l. Mar 15 '19 at 05:55
  • Used this `df.select(df.columns("trustset.*"))` got `TypeError: 'list' object is not callable` used this `df.select(df.col("trustset.*"))` got `AttributeError: 'DataFrame' object has no attribute 'col'` – 艾瑪艾瑪艾瑪 Mar 15 '19 at 06:08
  • There is no such thing as a nested JSON. [JSON](http://json.org) is a text representation of some data structure. And there is no JSON in the question. – axiac Mar 15 '19 at 06:25

1 Answers1

1

DataFrame's select can help you select nested fields in struct

from pyspark.sql.functions import col
df.select(col("trustset.*")).show()

There's a similar question: How to flatten a struct in a Spark dataframe?

a.l.
  • 1,085
  • 12
  • 29