7

I am working Spark v1.6. I have the following two DataFrames and I want to convert the null to 0 in my left outer join ResultSet. Any suggestions?

DataFrames

val x: Array[Int] = Array(1,2,3)
val df_sample_x = sc.parallelize(x).toDF("x")

val y: Array[Int] = Array(3,4,5)
val df_sample_y = sc.parallelize(y).toDF("y")

Left Outer Join

val df_sample_join = df_sample_x
  .join(df_sample_y,df_sample_x("x") === df_sample_y("y"),"left_outer")

ResultSet

scala> df_sample_join.show

x  |  y
--------
1  |  null

2  |  null

3  |  3

But I want the resultset to be displayed as.
-----------------------------------------------

scala> df_sample_join.show

x  |  y
--------
1  |  0

2  |  0

3  |  3
Bartosz Konieczny
  • 1,985
  • 12
  • 27
Prasan
  • 111
  • 1
  • 2
  • 4

3 Answers3

13

Just use na.fill:

df.na.fill(0, Seq("y"))
7

Try:

val withReplacedNull = df_sample_join.withColumn("y", coalesce('y, lit(0)))

Tested on:

import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.{col, udf}
import org.apache.spark.sql.types._

val list = List(Row("a", null), Row("b", null), Row("c", 1));
val rdd = sc.parallelize(list);

val schema = StructType(
    StructField("text", StringType, false) ::
    StructField("y", IntegerType, false) :: Nil)

val df = sqlContext.createDataFrame(rdd, schema)
val df1 = df.withColumn("y", coalesce('y, lit(0)));
df1.show()
T. Gawęda
  • 15,706
  • 4
  • 46
  • 61
  • 1
    Thanks. it worked :) val df_sample_join = df_sample_x.join(df_sample_y,df_sample_x("x") === df_sample_y("y"),"left_outer").select(df_sample_x("x"),coalesce('y, lit(0))) – Prasan Nov 23 '16 at 20:36
3

You can fix your existing dataframe like this:

import org.apache.spark.sql.functions.{when,lit}
val correctedDf=df_sample_join.withColumn("y", when($"y".isNull,lit(0)).otherwise($"y"))

Although T. Gawęda's answer also works, I think this is more readable

Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
  • 2
    In my opinion coalesce is more readable - it depends ;) But your answer is also correct, so upvote from my side – T. Gawęda Nov 23 '16 at 19:53