1

I'm using withColumn in order to override a certain column (applying the same value to the entire data frame), my problem is that withColumn changes the nullable property of the column:

import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.lit

val schema = StructType(Array(
                 StructField("id", StringType, true),
                 StructField("name", StringType, true)
             ))
val data = Seq(Row(1, "pepsi"), Row(2, "coca cola"))
val rdd = spark.sparkContext.parallelize(data)
val df = spark.createDataFrame(rdd, schema)
df.withColumn("name", lit("*******"))
df.printSchema

result:

root
 |-- id: string (nullable = true)
 |-- name: string (nullable = false)

The best idea I have is change the schema after the manipulation, was wondering if someone has a better idea.

Thanks!

Golan Kiviti
  • 3,895
  • 7
  • 38
  • 63
  • 1
    Check the 2nd answer here : https://stackoverflow.com/questions/33193958/change-nullable-property-of-column-in-spark-dataframe – Mohana B C Aug 03 '21 at 08:03
  • Mohana is right, the Literal column's nullable is defined as : `override def nullable: Boolean = value == null (Literal's source code)`, so we cant change it. that answer wrap it by `when`. – tianzhipeng Aug 03 '21 at 08:59

0 Answers0