I am looking to dynamically flatten a parquet file in Spark with Scala efficiently. I was wondering what an efficient way to achieve this.
The parquet file contains multiple Array and Struct Type Nesting at multiple depth levels. The parquet file schema can change in the future, so I cannot hard code any attributes. The desired end result is a flattened delimited file.
Would a solution using flatmap and recursively exploding work?
Example Schema:
|-- exCar: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- exCarOne: string (nullable = true)
| | |-- exCarTwo: string (nullable = true)
| | |-- exCarThree: string (nullable = true)
|-- exProduct: string (nullable = true)
|-- exName: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- exNameOne: string (nullable = true)
| | |-- exNameTwo: string (nullable = true)
| | |-- exNameThree: string (nullable = true)
| | |-- exNameFour: string (nullable = true)
| | |-- exNameCode: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- exNameCodeOne: string (nullable = true)
| | | | |-- exNameCodeTwo: string (nullable = true)
| | |-- exColor: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- exColorOne: string (nullable = true)
| | | | |-- exColorTwo: string (nullable = true)
| | | | |-- exWheelColor: array (nullable = true)
| | | | | |-- element: struct (containsNull = true)
| | | | | | |-- exWheelColorOne: string (nullable = true)
| | | | | | |-- exWheelColorTwo: string (nullable = true)
| | | | | | |--exWheelColorThree: string (nullable =true)
| | |-- exGlass: string (nullable = true)
|-- exDetails: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- exBill: string (nullable = true)
| | |-- exAccount: string (nullable = true)
| | |-- exLoan: string (nullable = true)
| | |-- exRate: string (nullable = true)
Desired output Schema:
exCar.exCarOne
exCar.exCarTwo
exCar.exCarThree
exProduct
exName.exNameOne
exName.exNameTwo
exName.exNameThree
exName.exNameFour
exName.exNameCode.exNameCodeOne
exName.exNameCode.exNameCodeTwo
exName.exColor.exColorOne
exName.exColor.exColorTwo
exName.exColor.exWheelColor.exWheelColorOne
exName.exColor.exWheelColor.exWheelColorTwo
exName.exColor.exWheelColor.exWheelColorThree
exName.exGlass
exDetails.exBill
exDetails.exAccount
exDetails.exLoan
exDetails.exRate