2

I am desperately trying to change the timezone of my JVM in Sparklyr (using spark 2.1.0). I want GMT everywhere.

I am setting:

config$`driver.extraJavaOptions` <-"Duser.timezone=GMT"

in my spark_config() file but unfortunately, in the Spark UI I still see (under System Properties) that user.timezone is set to America/New_York.

Any ideas? Thanks!

zero323
  • 322,348
  • 103
  • 959
  • 935
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235

1 Answers1

2

A few things:

  • The name of the property is spark.driver.extraJavaOptions.
  • The value is missing leading -. Should be -Duser.timezone=GMT.
  • For consistency you need both spark.driver.extraJavaOptions and spark.executor.extraJavaOptions.
  • In general case spark.driver.extraJavaOptions and similar properties should be set outside the application. As explained in the official documentation:

    In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-java-options command line option or in your default properties file.

    On the driver calling corresponding Java methods should work

    # sc is spark_shell_connection / spark_connection
    sparklyr::invoke_static(sc, "java.util.TimeZone",  "getTimeZone", "GMT") %>%
      sparklyr::invoke_static(sc, "java.util.TimeZone", "setDefault", .)
    

    but might not be reflected in the UI, and you'll still need spark.executor.extraJavaOptions.

In general case you should edit spark-defualts.conf in the configuration directory to include

spark.driver.extraJavaOptions -Duser.timezone=GMT
spark.executor.extraJavaOptions -Duser.timezone=GMT

If you cannot modify main configuration you can create an application specific directory and point to it using SPARK_CONF_DIR environment variabl.e

In the recent versions you can also set spark.sql.session.timeZone in the application itself (note that it is different than corresponding JVM options and affects only Spark queries).

zero323
  • 322,348
  • 103
  • 959
  • 935
  • thank you. i actually included the dash in my code and tried with both the executor and driver. does not seem to work. the solution you are suggesting with invoke should be run before starting the spark session? – ℕʘʘḆḽḘ Sep 17 '18 at 10:17
  • `invoke_static` takes session as the first argument, so after. However the right way to do it is to use configuration files. – zero323 Sep 17 '18 at 10:19
  • I tried to run the `invoke_static` commands and that returns `NULL`. After that, it still seems I am in EST because running something like `unix_t = from_utc_timestamp(timestamp(t) ,'UTC'), myday = as.character(to_date(unix_t))` , then `myday` is shown converted in `EST` instead of `GMT`... I am really puzzled here. A possible solution is to use `spark_apply` but this is prohibitively slow https://stackoverflow.com/questions/49355077/how-to-convert-a-timestamp-into-string-without-changing-timezone – ℕʘʘḆḽḘ Sep 17 '18 at 12:05
  • 1
    That's because it handles only driver side, not executors. Really use configuration files. – zero323 Sep 17 '18 at 14:19
  • thank you. what should I put in the `spark_conf_dir` then? assuming I cannot modify the main conf files? – ℕʘʘḆḽḘ Sep 18 '18 at 12:54
  • Thank you, I was facing the same problem of Spark converting UTC timestamps to the local timezone. Solved it by following @zero323 recommendation: Add these two lines to the spark-defaults.conf file: spark.driver.extraJavaOptions -Duser.timezone=UTC spark.executor.extraJavaOptions -Duser.timezone=UTC – Willian Adamczyk Aug 23 '22 at 09:43