I need to read a csv file in Spark with specific date-format. But I still end up with the date
column interpreted as a general string
instead of date
.
Input csv file:
cat oo2.csv
date,something
2013.01.02,0
2013.03.21,0
with Spark 3.1.1 :
import org.apache.spark.sql.SparkSession
val spark:SparkSession = SparkSession.builder().master("local[*]")
.appName("Hmmm")
.getOrCreate()
val oo = spark.read.
option("header","true").
option("inferSchema","true").
option("dateFormat","yyyy.MM.dd").
csv("oo2.csv")
oo.printSchema()
oo.show()
I get:
root
|-- date: string (nullable = true)
|-- something: integer (nullable = true)
+----------+---------+
| date|something|
+----------+---------+
|2013-01-02| 0|
|2013-03-21| 0|
+----------+---------+
Am I missing something? It should be simple, basically similar approach is described in: https://stackoverflow.com/a/46299504/1408096 but no joy...
ps if I try to parse the date outside Spark
import java.text.SimpleDateFormat
val a = new SimpleDateFormat("yyyy.MM.dd")
a.parse("2013.01.02")
It works perfectly fine