flattening complex data types in pyspark

Question

I have advance data types in my data-frame like array , array and such other combinations with nesting . I am trying to write a generic function that works without mentioning column name and flatten data-frame . Is there library already available or some function that can make this possible ?

one such example of schema present in data-frame :

 array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- col1: string (nullable = true)
 |    |    |-- col2: string (nullable = true)
 |    |    |-- col3: string (nullable = true)
 |    |    |-- col4: string (nullable = true)
 |    |    |-- col5: string (nullable = true)
 |    |    |-- col6: string (nullable = true)
 |    |    |-- col7: boolean (nullable = true)
 |    |    |-- col8: boolean (nullable = true)
 |    |    |-- col9: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- field1: string (nullable = true)
 |    |    |    |    |-- field2: string (nullable = true)
 |    |    |    |    |-- field3: boolean (nullable = true)
 |    |    |    |    |-- field4: string (nullable = true)
 |    |    |    |    |-- field5: string (nullable = true)

You can check - https://stackoverflow.com/questions/61863489/flatten-nested-json-in-scala-spark-dataframe/61863579#61863579 post, it is in scala & it will give you some idea. — Srinivas, Dec 09 '20 at 14:45
this is completely possible , please help me the with level of flattening you need , and how the new columns must be named and post a sample code for generation of the dataframe with fields so that we can work to solve this parsing — Aditya Vikram Singh, Dec 09 '20 at 16:05
You can try this solution : https://stackoverflow.com/a/64464560/3238085 — user238607, Dec 10 '20 at 12:48

flattening complex data types in pyspark

0 Answers0