1

I have a simple code that uses DataFrame.withColumn

  test("SparkSQLTest") {
    val spark = SparkSession.builder().master("local").appName("SparkSQLTest").getOrCreate()
    import spark.implicits._
    var df = spark.createDataset(
      Seq(
        ("1", "2"),
        ("3", "4")
      )
    ).toDF("a", "b")
    df = df.withColumn("c", functions.lit(null.asInstanceOf[String]).as[String])
    df.printSchema()
    df.show(truncate = false)
  }

The output schema is:

root
 |-- a: string (nullable = true, metadata = {})
 |-- b: string (nullable = true, metadata = {})
 |-- c: null (nullable = true, metadata = {})

c column's type is null, i thought it was string. If it was null, then I can't write to csv, as null data type is not supported.

I would ask how to make c's type correct.

Tom
  • 5,848
  • 12
  • 44
  • 104
  • try it .withColumn("c", lit(null).cast("string")) – Neil_TW Dec 23 '18 at 06:25
  • Possible duplicate of [Create new Dataframe with empty/null field values](https://stackoverflow.com/questions/32067467/create-new-dataframe-with-empty-null-field-values) – 10465355 Dec 23 '18 at 12:03

1 Answers1

3

I've tried this and it works.

df = df.withColumn("c", functions.lit(null.asInstanceOf[String]).cast(StringType))

Sorry, it should be cast... I've modified it.

Jiayi Liao
  • 999
  • 4
  • 15