Dataframe1 looks like this
root
|-- source: string (nullable = true)
|-- results: array (nullable = true)
| |-- content: struct (containsNull = true)
| | |-- ptype: string (nullable = true)
| | |-- domain: string (nullable = true)
| | |-- verb: string (nullable = true)
| | |-- foobar: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
| | |-- fooId: integer (nullable = true)
|-- date: string (nullable = false)
|-- hour: string (nullable = false)
Dataframe 2 look like below:
root
|-- source: string (nullable = true)
|-- results: array (nullable = true)
| |-- content: struct (containsNull = true)
| | |-- ptype: string (nullable = true)
| | |-- domain: string (nullable = true)
| | |-- verb: string (nullable = true)
| | |-- foobar: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
|-- date: string (nullable = false)
|-- hour: string (nullable = false)
Notice the differnce - there is no fooId
in the second dataframe.
How can I union these two dataframes together?
I understand that the two schemas need to be the same to union. What is the best way to add fooId
or remove fooId
?(non trivial because of the structure of the schema) What is the recommended approach for doing union of this kind.
Thanks