13

Using Scala 2.10.4 and spark 1.5.1 and spark 1.6

sqlContext.sql(
  """
    |select id,
    |to_date(from_utc_timestamp(from_unixtime(at), 'US/Pacific')),
    |from_utc_timestamp(from_unixtime(at), 'US/Pacific'),
    |from_unixtime(at),
    |to_date(from_unixtime(at)),
    | at
    |from events
    | limit 100
  """.stripMargin).collect().foreach(println)

Spark-Submit options: --driver-java-options '-Duser.timezone=US/Pacific'

result:

[56d2a9573bc4b5c38453eae7,2016-02-28,2016-02-27 16:01:27.0,2016-02-28 08:01:27,2016-02-28,1456646487]
[56d2aa1bfd2460183a571762,2016-02-28,2016-02-27 16:04:43.0,2016-02-28 08:04:43,2016-02-28,1456646683]
[56d2aaa9eb63bbb63456d5b5,2016-02-28,2016-02-27 16:07:05.0,2016-02-28 08:07:05,2016-02-28,1456646825]
[56d2aab15a21fa5f4c4f42a7,2016-02-28,2016-02-27 16:07:13.0,2016-02-28 08:07:13,2016-02-28,1456646833]
[56d2aac8aeeee48b74531af0,2016-02-28,2016-02-27 16:07:36.0,2016-02-28 08:07:36,2016-02-28,1456646856]
[56d2ab1d87fd3f4f72567788,2016-02-28,2016-02-27 16:09:01.0,2016-02-28 08:09:01,2016-02-28,1456646941]

The time in US/Pacific should be 2016-02-28 00:01:27 etc but some how it subtracts "8" hours twice

zero323
  • 322,348
  • 103
  • 959
  • 935
Gaurav Shah
  • 5,223
  • 7
  • 43
  • 71

2 Answers2

13

after reading for sometime following are the conclusions:

  • Spark-Sql doesn't support date-time, and nor timezones
  • Using timestamp is the only solution
  • from_unixtime(at) parses the epoch time correctly, just that the printing of it as a string changes it due to timezone. It is safe to assume that the from_unixtime will convert it correctly ( although printing it might show different results)
  • from_utc_timestamp will shift ( not just convert) the timestamp to that timezone, in this case it will subtract 8 hours to the time since (-08:00)
  • printing sql results messes up the times with respect to timezone param
Gaurav Shah
  • 5,223
  • 7
  • 43
  • 71
  • 1
    from_unixtime(at) does what from_utc_timestamp does too, it will parse a Unix timestamp integer (seconds since midnight 1970-01-01), and convert the time instant parsed from UTC to the system's default timezone. – user180940 Oct 17 '16 at 19:14
  • This appears to have been resolved in `Spark 2.3.2`, read [this](https://stackoverflow.com/questions/45558499/spark-sql-converting-string-to-timestamp#comment96080435_45558499) – y2k-shubham Feb 12 '19 at 07:16
1

For the record, here we convert Long values like that using an UDF.

For our purpose, we are interested in only the Date string representation of the timestamp (in ms since epoch in UTC)

val udfToDateUTC = udf((epochMilliUTC: Long) => {
  val dateFormatter = java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd").withZone(java.time.ZoneId.of("UTC"))
  dateFormatter.format(java.time.Instant.ofEpochMilli(epochMilliUTC))
})

This way, we control the parsing as well as the rendering of the dates.

Michel Lemay
  • 2,054
  • 2
  • 17
  • 34