2

I have the following Dataframe :

val df = Seq(
  ("a",Some(1.0)),
  ("b",None),
  ("c",Some(3.0))
).toDF("id","x")

df.show()

+---+----+
| id|   x|
+---+----+
|  a| 1.0|
|  b|null|
|  c| 3.0|
+---+----+

Then I do

df.as[(String,Double)]
.collect
.foreach(println)

(a,1.0)
(b,-1.0) <-- why??
(c,3.0)

So the null is converted to -1.0, why is that? I expected that it will be mapped to 0.0. Interestingly, thats indeed the case if I do :

df
.select($"x")
.as[Double]
.collect
.foreach(println)

1.0
0.0
3.0

I'm aware that In my case mapping to Option[Double] or java.lang.Double is the way to go, but I would still be interested in understanding what spark does with non-nullable types such as Double.

I'm using Spark 2.1.1 with Scala 2.10.6 by the way

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145

0 Answers0