I'm trying to compare timestamps within a map, but Spark seems to be using a different timezone or something else that is really weird. I read a dummy csv file like the following to build the input dataframe :
"ts"
"1970-01-01 00:00:00"
"1970-01-01 00:00:00"
df.show(2)
+-------------------+
| ts |
+-------------------+
|1970-01-01 00:00:00|
|1970-01-01 00:00:00|
+-------------------+
For now, nothing to report, but then :
df.rdd.map { row =>
val timestamp = row.getTimestamp(0)
val timestampMilli = timestamp.toInstant.toEpochMilli
val epoch = Timestamp.from(Instant.EPOCH)
val epochMilli = epoch.toInstant.toEpochMilli
(timestamp, timestampMilli, epoch, epochMilli)
}.foreach(println)
(1970-01-01 00:00:00.0,-3600000,1970-01-01 01:00:00.0,0)
(1970-01-01 00:00:00.0,-3600000,1970-01-01 01:00:00.0,0)
I don't understand why both timestamp are not 1970-01-01 00:00:00.0, 0
. Anyone know what I'm missing ?
NB : I already setup the session timezone to UTC, using the following properties.
spark.sql.session.timeZone=UTC
user.timezone=UTC