Filling an empty value in Scala Spark Dataframe

Question

I am currently working with a dataframe right now in scala, and can't figure out how to fill a column with a Seq.empty[Row] value if the value in that row is null. I understand there is the df.na.fill function, however it seems to only handle strings, doubles, or ints from my understanding.

Essentially, I want to be able to fill the null row with a Seq.empty[Row], for a nested field. I'm sure this is possible, however it seems like I can't find anything like it online.

Table example --

Essentially, I want that null to turn into an empty array like so: []. Any help would be appreciated, thank you.

score 3 · Accepted Answer · answered Dec 17 '16 at 05:00

3

You can do it using udf and coalesce like below.

val df = Seq(("stuff2","stuff2",null), ("stuff2","stuff2",Array("value1","value2")),("stuff3","stuff3",Array("value3"))).toDF("field","field2","values")
df.show()

import org.apache.spark.sql.functions.udf
val array_ = udf(() => Array.empty[String])

val df2 = df.withColumn("values", coalesce(df("values"), array_()))
df2.show()

answered Dec 17 '16 at 05:00

abaghel

14,783
2
50
66

There's a lot of difficulty in converting a `null` to empty `Sequence` of `Row` perfectly, because `coalesce` is dependent on the type of the column being correct. I'm having trouble inferring the type of the column effectively, and I think that's a different question. However your point still stands, so thank you for the answer. – Daniel Dao Dec 22 '16 at 15:24
Clever hack using `udf` to inject non-literal value! – metasim Apr 24 '17 at 12:26

Filling an empty value in Scala Spark Dataframe

1 Answers1