0

I want to replicate the problem mentioned here in Scala DataFrames. I have tried using the following approaches, to no success so far.

Input

Col1  Col2
A       M
B       K
null    S

Expected Output

Col1  Col2
A       M
B       K
S <---- S

Approach 1

val output = df.na.fill("A", Seq("col1"))

The fill method does not take a column as the (first) input.

Approach 2

val output = df.where(df.col("col1").isNull)

I cannot find a suitable method to call after I have identified the null values.

Approach 3

val output = df.dtypes.map(column =>
  column._2 match {
    case "null" => (column._2 -> 0)
  }).toMap

I get a StringType error.

smaug
  • 846
  • 10
  • 26

1 Answers1

3

I'd use when/otherwise, as shown below:

import spark.implicits._
import org.apache.spark.sql.functions._

val df = Seq(
  ("A", "M"), ("B", "K"), (null, "S")
).toDF("Col1", "Col2")

df.withColumn("Col1", when($"Col1".isNull, $"Col2").otherwise($"Col1")).show
// +----+----+
// |Col1|Col2|
// +----+----+
// |   A|   M|
// |   B|   K|
// |   S|   S|
// +----+----+
Leo C
  • 22,006
  • 3
  • 26
  • 39