Dataframe withColumn and null

Question

I have a simple code that uses DataFrame.withColumn

  test("SparkSQLTest") {
    val spark = SparkSession.builder().master("local").appName("SparkSQLTest").getOrCreate()
    import spark.implicits._
    var df = spark.createDataset(
      Seq(
        ("1", "2"),
        ("3", "4")
      )
    ).toDF("a", "b")
    df = df.withColumn("c", functions.lit(null.asInstanceOf[String]).as[String])
    df.printSchema()
    df.show(truncate = false)
  }

The output schema is:

root
 |-- a: string (nullable = true, metadata = {})
 |-- b: string (nullable = true, metadata = {})
 |-- c: null (nullable = true, metadata = {})

c column's type is null, i thought it was string. If it was null, then I can't write to csv, as null data type is not supported.

I would ask how to make c's type correct.

Possible duplicate of [Create new Dataframe with empty/null field values](https://stackoverflow.com/questions/32067467/create-new-dataframe-with-empty-null-field-values) — 10465355, Dec 23 '18 at 12:03

Jiayi Liao · Accepted Answer · 2018-12-23T06:32:15.603

3

I've tried this and it works.

df = df.withColumn("c", functions.lit(null.asInstanceOf[String]).cast(StringType))

Sorry, it should be cast... I've modified it.

edited Dec 23 '18 at 06:32

answered Dec 23 '18 at 06:26

Jiayi Liao

999
4
15

Dataframe withColumn and null

1 Answers1