Spark SQL is not converting timezone correctly

Question

Using Scala 2.10.4 and spark 1.5.1 and spark 1.6

sqlContext.sql(
  """
    |select id,
    |to_date(from_utc_timestamp(from_unixtime(at), 'US/Pacific')),
    |from_utc_timestamp(from_unixtime(at), 'US/Pacific'),
    |from_unixtime(at),
    |to_date(from_unixtime(at)),
    | at
    |from events
    | limit 100
  """.stripMargin).collect().foreach(println)

Spark-Submit options: --driver-java-options '-Duser.timezone=US/Pacific'

result:

[56d2a9573bc4b5c38453eae7,2016-02-28,2016-02-27 16:01:27.0,2016-02-28 08:01:27,2016-02-28,1456646487]
[56d2aa1bfd2460183a571762,2016-02-28,2016-02-27 16:04:43.0,2016-02-28 08:04:43,2016-02-28,1456646683]
[56d2aaa9eb63bbb63456d5b5,2016-02-28,2016-02-27 16:07:05.0,2016-02-28 08:07:05,2016-02-28,1456646825]
[56d2aab15a21fa5f4c4f42a7,2016-02-28,2016-02-27 16:07:13.0,2016-02-28 08:07:13,2016-02-28,1456646833]
[56d2aac8aeeee48b74531af0,2016-02-28,2016-02-27 16:07:36.0,2016-02-28 08:07:36,2016-02-28,1456646856]
[56d2ab1d87fd3f4f72567788,2016-02-28,2016-02-27 16:09:01.0,2016-02-28 08:09:01,2016-02-28,1456646941]

The time in US/Pacific should be 2016-02-28 00:01:27 etc but some how it subtracts "8" hours twice

score 13 · Answer 1 · answered Mar 23 '16 at 04:48

13

after reading for sometime following are the conclusions:

Spark-Sql doesn't support date-time, and nor timezones
Using timestamp is the only solution
from_unixtime(at) parses the epoch time correctly, just that the printing of it as a string changes it due to timezone. It is safe to assume that the from_unixtime will convert it correctly ( although printing it might show different results)
from_utc_timestamp will shift ( not just convert) the timestamp to that timezone, in this case it will subtract 8 hours to the time since (-08:00)
printing sql results messes up the times with respect to timezone param

answered Mar 23 '16 at 04:48

Gaurav Shah

5,223
7
43
71

1

from_unixtime(at) does what from_utc_timestamp does too, it will parse a Unix timestamp integer (seconds since midnight 1970-01-01), and convert the time instant parsed from UTC to the system's default timezone. – user180940 Oct 17 '16 at 19:14
This appears to have been resolved in `Spark 2.3.2`, read [this](https://stackoverflow.com/questions/45558499/spark-sql-converting-string-to-timestamp#comment96080435_45558499) – y2k-shubham Feb 12 '19 at 07:16

score 1 · Answer 2 · answered Sep 07 '17 at 14:24

For the record, here we convert Long values like that using an UDF.

For our purpose, we are interested in only the Date string representation of the timestamp (in ms since epoch in UTC)

val udfToDateUTC = udf((epochMilliUTC: Long) => {
  val dateFormatter = java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd").withZone(java.time.ZoneId.of("UTC"))
  dateFormatter.format(java.time.Instant.ofEpochMilli(epochMilliUTC))
})

This way, we control the parsing as well as the rendering of the dates.

Spark SQL is not converting timezone correctly

2 Answers2

Linked