I have a schema of a nested Struct within an Array. I want to order the columns of the nested struct alphabetically.
This question gave a complex function, but it does not work for structs nested in arrays. Any Help is appreciated.
I am working with PySpark 3.2.1.
My Schema:
root
|-- id: integer (nullable = true)
|-- values: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- Dep: string (nullable = true)
| | |-- ABC: string (nullable = true)
How it should look:
root
|-- id: integer (nullable = true)
|-- values: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- ABC: string (nullable = true)
| | |-- Dep: string (nullable = true)
Reproducible Example:
data = [
(10, [{"Dep": 10, "ABC": 1}, {"Dep": 10, "ABC": 1}]),
(20, [{"Dep": 20, "ABC": 1}, {"Dep": 20, "ABC": 1}]),
(30, [{"Dep": 30, "ABC": 1}, {"Dep": 30, "ABC": 1}]),
(40, [{"Dep": 40, "ABC": 1}, {"Dep": 40, "ABC": 1}])
]
myschema = StructType(
[
StructField("id", IntegerType(), True),
StructField("values",
ArrayType(
StructType([
StructField("Dep", StringType(), True),
StructField("ABC", StringType(), True)
])
))
]
)
df = spark.createDataFrame(data=data, schema=myschema)
df.printSchema()
df.show(10, False)