3

I have a csv file:

Name;Date
A;2018-01-01 10:15:25.123456
B;2018-12-31 10:15:25.123456

I try to parse with Spark Dataframe:

val df = spark.read.format(source="csv")
    .option("header", true)
    .option("delimiter", ";")
    .option("inferSchema", true)
    .option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSSSSS")

But the resulting Dataframe is (wrongly) truncated at the millisecond:

scala> df.show(truncate=false)
+---+-----------------------+
|Nom|Date                   |
+---+-----------------------+
|A  |2018-01-01 10:17:28.456|
|B  |2018-12-31 10:17:28.456|
+---+-----------------------+


df.first()(1).asInstanceOf[Timestamp].getNanos()
res51: Int = 456000000

Bonus question: read with nanoseconds precision

Benjamin
  • 3,350
  • 4
  • 24
  • 49
  • 1
    possible duplicate with https://stackoverflow.com/questions/41879125/handling-microseconds-in-spark-scala – Gal Naor Jun 18 '19 at 18:10
  • Possible duplicate of [Handling microseconds in Spark Scala](https://stackoverflow.com/questions/41879125/handling-microseconds-in-spark-scala) – mazaneicha Jun 18 '19 at 18:49

1 Answers1

1

.SSSSS means milliseconds not microseconds: java.util.Date format SSSSSS: if not microseconds what are the last 3 digits?, https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html So if you need microseconds you should parse the date by custom code: Handling microseconds in Spark Scala

Bonus answer: SparkSQL store data in microseconds internally, so you could use string to store nanos or separate field or any other custom solution

Artem Aliev
  • 1,362
  • 7
  • 12