iterating complex dataframe with array of structfield

Question

I have data in one of dataframe's column with the following schema

<type 'list'>: [StructField(data,StructType(List(StructField(account,StructType(List(StructField(Id,StringType,true),StructField(Name,StringType,true),StructField(books,ArrayType(StructType(List(StructField(bookTile,StringType,true),StructField(bookId,StringType,true),StructField(bookName,StringType,true))),true),true)))))))]

I want to interate them extract each value out of it and create a new dataframe. Is there any inbuilt functions in pyspark supports this or I should iterate them? Any efficient way?

There is an `explode` function that will put each element of the array on its own row. Is that what you want? — Shaido, Oct 23 '19 at 09:03
I tried it but it gave me "due to data type mismatch: input to function explode should be array or map type, not struct" — syv, Oct 23 '19 at 09:11
Ah, I may have missunderstood you. It would be a bit clearer if you can add an example input/expected output dataframe to the question. However, it could be that you are looking for how to expand a struct: https://stackoverflow.com/questions/38753898/how-to-flatten-a-struct-in-a-spark-dataframe or maybe this: https://stackoverflow.com/questions/39275816/exploding-nested-struct-in-spark-dataframe — Shaido, Oct 23 '19 at 09:32

iterating complex dataframe with array of structfield

0 Answers0