I am new to Scala and Spark. I have a question on dropping a nested array from my DataFrame.
This is my DataFrame schema:
root
|-- dedupeMode: string (nullable = true)
|-- modules: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- content: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- id: string (nullable = true)
| | | | |-- weight: double (nullable = true)
| | |-- id: string (nullable = true)
| | |-- randomize: boolean (nullable = true)
|-- vars: struct (nullable = true)
| |-- test_group: string (nullable = true)
| |-- vbs: string (nullable = true)
I want to get rid of the content array inside modules array . Actually I want an empty content array to replace it. So when I write the JSON, I have it like content[ ]
I have tried like:
dataFrame.drop("modules.content")
and also went through the solutions at Dropping a nested column from Spark DataFrame
but did not solve my problem. I have also tried other variations without success. What would you recommend?