I'm trying to read a csv file which has timestamps till nano seconds. sample content of file TestTimestamp.csv-
spark- 2.4.0, scala - 2.11.11
/**
* TestTimestamp.csv -
* 101,2019-SEP-23 11.42.35.456789123 AM
*
*/
Tried to read it using timestampFormat = "yyyy-MMM-dd hh.mm.ss.SSSSSSSSS aaa"
val dataSchema = StructType(Array(StructField("ID", DoubleType, true), StructField("Created_TS", TimestampType, true)))
val data = spark.read.format("csv")
.option("header", "false")
.option("inferSchema", "false")
.option("treatEmptyValuesAsNulls", "true")
//.option("nullValue", "")
.option("dateFormat", "yyyy-MMM-dd")
.option("timestampFormat", "yyyy-MMM-dd hh.mm.ss.SSSSSSSSS aaa")
.schema(dataSchema)
.load("C:\\TestData\\Raw\\TetraPak\\Shipments\\TestTimeStamp.csv")
data.select('Created_TS).show
Output which I get is completely wrong date-time. 23rd Sept got changed to 28th September
+--------------------+
| Created_TS|
+--------------------+
|2019-09-28 18:35:...|
+--------------------+
Even if I have Hours in 24 Hour formats like - "2019-SEP-23 16.42.35.456789123" and I try to use only first few digits of second fractions by giving timestampFormat = "yyyy-MMM-dd HH.mm.ss.SSS"
similar incorrect result-
val data2 = spark.read.format("csv")
.option("header", "false")
.option("inferSchema", "false")
.option("treatEmptyValuesAsNulls", "true")
//.option("nullValue", "")
.option("dateFormat", "yyyy-MMM-dd")
.option("timestampFormat", "yyyy-MMM-dd hh.mm.ss.SSS")
.schema(dataSchema)
.load("C:\\TestData\\Raw\\TetraPak\\Shipments\\TestTimeStamp.csv")
data2.select('Created_TS).show
+--------------------+
| Created_TS|
+--------------------+
|2019-09-28 23:35:...|
+--------------------+
is there any way to parse such timestamp strings while creating dataframe using csv reader ?