I'm using withColumn
in order to override a certain column (applying the same value to the entire data frame), my problem is that withColumn changes the nullable property of the column:
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.lit
val schema = StructType(Array(
StructField("id", StringType, true),
StructField("name", StringType, true)
))
val data = Seq(Row(1, "pepsi"), Row(2, "coca cola"))
val rdd = spark.sparkContext.parallelize(data)
val df = spark.createDataFrame(rdd, schema)
df.withColumn("name", lit("*******"))
df.printSchema
result:
root
|-- id: string (nullable = true)
|-- name: string (nullable = false)
The best idea I have is change the schema after the manipulation, was wondering if someone has a better idea.
Thanks!