0

I am new to Scala and Spark. I have a question on dropping a nested array from my DataFrame.

This is my DataFrame schema:

  root
 |-- dedupeMode: string (nullable = true)
 |-- modules: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- content: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- id: string (nullable = true)
 |    |    |    |    |-- weight: double (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- randomize: boolean (nullable = true)
 |-- vars: struct (nullable = true)
 |    |-- test_group: string (nullable = true)
 |    |-- vbs: string (nullable = true)

I want to get rid of the content array inside modules array . Actually I want an empty content array to replace it. So when I write the JSON, I have it like content[ ]

I have tried like:

dataFrame.drop("modules.content")

and also went through the solutions at Dropping a nested column from Spark DataFrame

but did not solve my problem. I have also tried other variations without success. What would you recommend?

esbej
  • 27
  • 7
  • I have seen that question already. Sorry, but I think there should be a better solution for this problem. that question is 2y 1m old. – esbej Nov 09 '17 at 12:09
  • Could you clarify specifically which answers you have already tried and how it failed for you? Otherwise it's hard to know what exact problem you're having. – Iguananaut Nov 10 '17 at 12:51
  • Did you get the answer for this? – Spark-Beginner Jun 14 '18 at 19:17

0 Answers0