How to drop a nested array inside another array from a DataFrame, read in from JSON?

Question

I am new to Scala and Spark. I have a question on dropping a nested array from my DataFrame.

This is my DataFrame schema:

  root
 |-- dedupeMode: string (nullable = true)
 |-- modules: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- content: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- id: string (nullable = true)
 |    |    |    |    |-- weight: double (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- randomize: boolean (nullable = true)
 |-- vars: struct (nullable = true)
 |    |-- test_group: string (nullable = true)
 |    |-- vbs: string (nullable = true)

I want to get rid of the content array inside modules array . Actually I want an empty content array to replace it. So when I write the JSON, I have it like content[ ]

I have tried like:

dataFrame.drop("modules.content")

and also went through the solutions at Dropping a nested column from Spark DataFrame

but did not solve my problem. I have also tried other variations without success. What would you recommend?

I have seen that question already. Sorry, but I think there should be a better solution for this problem. that question is 2y 1m old. — esbej, Nov 09 '17 at 12:09
Could you clarify specifically which answers you have already tried and how it failed for you? Otherwise it's hard to know what exact problem you're having. — Iguananaut, Nov 10 '17 at 12:51

How to drop a nested array inside another array from a DataFrame, read in from JSON?

0 Answers0