I have a simple code that uses DataFrame.withColumn
test("SparkSQLTest") {
val spark = SparkSession.builder().master("local").appName("SparkSQLTest").getOrCreate()
import spark.implicits._
var df = spark.createDataset(
Seq(
("1", "2"),
("3", "4")
)
).toDF("a", "b")
df = df.withColumn("c", functions.lit(null.asInstanceOf[String]).as[String])
df.printSchema()
df.show(truncate = false)
}
The output schema is:
root
|-- a: string (nullable = true, metadata = {})
|-- b: string (nullable = true, metadata = {})
|-- c: null (nullable = true, metadata = {})
c
column's type is null, i thought it was string. If it was null, then I can't write to csv, as null data type is not supported.
I would ask how to make c's type correct.