I've a complex DataFrame structure and would like to null a column easily. I've created implicit classes that wire functionality and easily address 2D DataFrame structures but once the DataFrame becomes more complicated with ArrayType or MapType I've not had much luck. For example:
I have schema defined as:
StructType(
StructField(name,StringType,true),
StructField(data,ArrayType(
StructType(
StructField(name,StringType,true),
StructField(values,
MapType(StringType,StringType,true),
true)
),
true
),
true)
)
I'd like to produce a new DF that has the field data.value
of MapType set to null, but as this is an element of an array I have not been able to figure out how. I would think it would be similar to:
df.withColumn("data.values", functions.array(functions.lit(null)))
but this ultimately creates a new column of data.values
and does not modify the values
element of the data array.