0

Consider I have the following data structure in a pyspark dataframe:

arr1:array
   element:struct
     string1:string
     arr2:array
         element:string
     string2: string

How can I remove the arr2 from my dataframe?

Paul Velthuis
  • 325
  • 4
  • 15
  • use `to_json` + `from_json`, see one similar post: https://stackoverflow.com/questions/58243292 – jxc Oct 15 '19 at 17:58

1 Answers1

0

You can use the drop function only. The way to select the nested columns is with .

Like window.start and window.end. You can access your arr2 as arr1.element.arr2.

df.drop(df.element.arr2)
pissall
  • 7,109
  • 2
  • 25
  • 45