2

I expect the following code to output "b" and null, since both don't equal the string "a". However, spark only outputs "b". To have the null in the output, I have to explicitly include $"word".isNull in the filter

val df = Seq(("a"),("b"),(null)).toDF("word")
df.filter($"word".notEqual("a")).show()

output:

+----+
|word|
+----+
|   b|
+----+

What am I missing about how Spark dataframe treats nulls?

Alon Catz
  • 2,417
  • 1
  • 19
  • 23
  • I think that you have to use `$"word".isNull`, since null is not a String. From scala doc definition: _Null is - together with scala.Nothing - at the bottom of the Scala type hierarchy. Null is a subtype of all reference types; its only instance is the null reference. Since Null is not a subtype of value types, null is not a member of any such type. For instance, it is not possible to assign null to a variable of type `scala.Int`_ May be this link could be helpful : https://stackoverflow.com/questions/39727742/how-to-filter-out-a-null-value-from-spark-dataframe – Driss NEJJAR Dec 13 '18 at 10:53

0 Answers0