0

I have data in one of dataframe's column with the following schema

<type 'list'>: [StructField(data,StructType(List(StructField(account,StructType(List(StructField(Id,StringType,true),StructField(Name,StringType,true),StructField(books,ArrayType(StructType(List(StructField(bookTile,StringType,true),StructField(bookId,StringType,true),StructField(bookName,StringType,true))),true),true)))))))]

I want to interate them extract each value out of it and create a new dataframe. Is there any inbuilt functions in pyspark supports this or I should iterate them? Any efficient way?

syv
  • 3,528
  • 7
  • 35
  • 50
  • There is an `explode` function that will put each element of the array on its own row. Is that what you want? – Shaido Oct 23 '19 at 09:03
  • I tried it but it gave me "due to data type mismatch: input to function explode should be array or map type, not struct" – syv Oct 23 '19 at 09:11
  • 1
    Ah, I may have missunderstood you. It would be a bit clearer if you can add an example input/expected output dataframe to the question. However, it could be that you are looking for how to expand a struct: https://stackoverflow.com/questions/38753898/how-to-flatten-a-struct-in-a-spark-dataframe or maybe this: https://stackoverflow.com/questions/39275816/exploding-nested-struct-in-spark-dataframe – Shaido Oct 23 '19 at 09:32

0 Answers0