I have a Dataframe like this:
val df = Seq(
Seq(("a","b","c"))
)
.toDF("arr")
.select($"arr".cast("array<struct<c1:string,c2:string,c3:string>>"))
df.printSchema
root
|-- arr: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- c1: string (nullable = true)
| | |-- c2: string (nullable = true)
| | |-- c3: string (nullable = true)
df.show()
+---------+
| arr|
+---------+
|[[a,b,c]]|
+---------+
I want to select only c1 and c3, such that:
df.printSchema
root
|-- arr: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- c1: string (nullable = true)
| | |-- c3: string (nullable = true)
df.show()
+---------+
| arr|
+---------+
|[[a,c]] |
+---------+
Can this be done without UDF?
I can do it with an UDF, but I'd like a solution without it, something like
df
.select($"arr.c1".as("arr"))
root
|-- arr: array (nullable = true)
| |-- element: string (containsNull = true)
But this only works to select 1 struct element, I've also tried :
df
.select(array(struct($"arr.c1",$"arr.c3")).as("arr"))
but this gives
root
|-- arr: array (nullable = false)
| |-- element: struct (containsNull = false)
| | |-- c1: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- c3: array (nullable = true)
| | | |-- element: string (containsNull = true)