1

I have a spark dataframe in JAVA with nested column

below is the .prinschema() result

root
 |-- BeginDateTime: struct (nullable = true)
 |    |-- _XmlNodeValue: string (nullable = true)
 |-- BusinessDayDate: string (nullable = true)
 |-- BusinessUnit: struct (nullable = true)
 |    |-- UnitID: struct (nullable = true)

I can use dataframe.drop("BeginDateTime") to drop the BeginDateTime column (as well as all other column right under the root node).

But if I do dataframe.drop("BusinessUnit.UnitID"), it will not drop it.

I tried this with other columns and they all behave the same: can't drop a nested column.

There are lots of answer for this question but they are all in scala/python. I have to work in a Java environment.

There is one thread (Dropping a nested column from Spark DataFrame), the last answer is in Java. But I can't use his code since line 8 and line 27 error out,

The errors are no method col, no method struct.

Can someone provide me with a working solution in JAVA?

Thanks

DennisLi
  • 3,915
  • 6
  • 30
  • 66
milton
  • 101
  • 2
  • 11
  • you can import col and struct methods with: ```import static org.apache.spark.sql.functions.col```, ```import static org.apache.spark.sql.functions.struct``` – vinsce Aug 12 '19 at 20:34
  • @vinsce The col did work. the error I get now is in the struct. Intellij gives me two option: 1, Make getCOlumn retrun 'scala.collection.Seq' Option 2: create method 'struct' in xxxx – milton Aug 12 '19 at 20:44
  • struct(scala.collection.Seq,org.apache.spark.sql.Column>) in functions cannot be applied to (org.apache.spark.sql.COlumn[]) – milton Aug 13 '19 at 14:42

0 Answers0