I am working on a dataframe that looks like this:
root
|-- _id: string (nullable = true)
|-- positions: struct (nullable = true)
| |-- precise: struct (nullable = true)
| | |-- lat: double (nullable = true)
| | |-- lng: double (nullable = true)
| |-- unprecise: struct (nullable = true)
| | |-- lat: double (nullable = true)
| | |-- lng: double (nullable = true)
The Struct objects in positions Struct can contain "precise", or "unprecise", or both, or several others Struct objects. So a row having precise and unprecise location should be exploded into two rows.
What would be the best way to explode such dataframe? Ideally I would like to have:
root
|-- _id: string (nullable = true)
|-- positions_type: string (nullable = true) // "precise" or "unprecise"
|-- lat: double (nullable = true)
|-- lng: double (nullable = true)
I have followed Exploding nested Struct in Spark dataframe it is about exploding a Struct column and not a nested Struct.
Another idea would be to flatten everything and have as many columns as nested struct object there are, but it is not really ideal as the schema will change if new struct objects is added.
Thanks in advance.