The issues:
1) Spark doesn't call UDF if input is column of primitive type that contains null
:
inputDF.show()
+-----+
| x |
+-----+
| null|
| 1.0|
+-----+
inputDF
.withColumn("y",
udf { (x: Double) => 2.0 }.apply($"x") // will not be invoked if $"x" == null
)
.show()
+-----+-----+
| x | y |
+-----+-----+
| null| null|
| 1.0| 2.0|
+-----+-----+
2) Can't produce null
from UDF as a column of primitive type:
udf { (x: String) => null: Double } // compile error