0

I have advance data types in my data-frame like array , array and such other combinations with nesting . I am trying to write a generic function that works without mentioning column name and flatten data-frame . Is there library already available or some function that can make this possible ?

one such example of schema present in data-frame :

 array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- col1: string (nullable = true)
 |    |    |-- col2: string (nullable = true)
 |    |    |-- col3: string (nullable = true)
 |    |    |-- col4: string (nullable = true)
 |    |    |-- col5: string (nullable = true)
 |    |    |-- col6: string (nullable = true)
 |    |    |-- col7: boolean (nullable = true)
 |    |    |-- col8: boolean (nullable = true)
 |    |    |-- col9: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- field1: string (nullable = true)
 |    |    |    |    |-- field2: string (nullable = true)
 |    |    |    |    |-- field3: boolean (nullable = true)
 |    |    |    |    |-- field4: string (nullable = true)
 |    |    |    |    |-- field5: string (nullable = true)
Jay
  • 296
  • 10
  • 25
  • can you add schema of your data ? – Srinivas Dec 09 '20 at 14:27
  • added in question itself , I am using spark version 2.3.2 . – Jay Dec 09 '20 at 14:31
  • You can check - https://stackoverflow.com/questions/61863489/flatten-nested-json-in-scala-spark-dataframe/61863579#61863579 post, it is in scala & it will give you some idea. – Srinivas Dec 09 '20 at 14:45
  • 1
    this is completely possible , please help me the with level of flattening you need , and how the new columns must be named and post a sample code for generation of the dataframe with fields so that we can work to solve this parsing – Aditya Vikram Singh Dec 09 '20 at 16:05
  • You can try this solution : https://stackoverflow.com/a/64464560/3238085 – user238607 Dec 10 '20 at 12:48

0 Answers0