Parse dates with microseconds precision with dataframe in Spark

Question

I have a csv file:

Name;Date
A;2018-01-01 10:15:25.123456
B;2018-12-31 10:15:25.123456

I try to parse with Spark Dataframe:

val df = spark.read.format(source="csv")
    .option("header", true)
    .option("delimiter", ";")
    .option("inferSchema", true)
    .option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSSSSS")

But the resulting Dataframe is (wrongly) truncated at the millisecond:

scala> df.show(truncate=false)
+---+-----------------------+
|Nom|Date                   |
+---+-----------------------+
|A  |2018-01-01 10:17:28.456|
|B  |2018-12-31 10:17:28.456|
+---+-----------------------+


df.first()(1).asInstanceOf[Timestamp].getNanos()
res51: Int = 456000000

Bonus question: read with nanoseconds precision

possible duplicate with https://stackoverflow.com/questions/41879125/handling-microseconds-in-spark-scala — Gal Naor, Jun 18 '19 at 18:10
Possible duplicate of [Handling microseconds in Spark Scala](https://stackoverflow.com/questions/41879125/handling-microseconds-in-spark-scala) — mazaneicha, Jun 18 '19 at 18:49

score 1 · Answer 1 · answered Jun 18 '19 at 18:13

.SSSSS means milliseconds not microseconds: java.util.Date format SSSSSS: if not microseconds what are the last 3 digits?, https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html So if you need microseconds you should parse the date by custom code: Handling microseconds in Spark Scala

Bonus answer: SparkSQL store data in microseconds internally, so you could use string to store nanos or separate field or any other custom solution

Parse dates with microseconds precision with dataframe in Spark

1 Answers1