I have a CSV file, test.csv
:
col
1
2
3
4
When I read it using Spark, it gets the schema of data correct:
val df = spark.read.option("header", "true").option("inferSchema", "true").csv("test.csv")
df.printSchema
root
|-- col: integer (nullable = true)
But when I override the schema
of CSV file and make inferSchema
false, then SparkSession is picking up custom schema partially.
val df = spark.read.option("header", "true").option("inferSchema", "false").schema(StructType(List(StructField("custom", StringType, false)))).csv("test.csv")
df.printSchema
root
|-- custom: string (nullable = true)
I mean only column name (custom
) and DataType (StringType
) are getting picked up. But, nullable
part is being ignored, as it is still coming nullable = true
, which is incorrect.
I am not able to understand this behavior. Any help is appreciated !