In pyspark, one can get the local time from the UTC time by passing the timestamp and the timezone to the function from_utc_timestamp
>>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
>>> df.select(from_utc_timestamp(df.t, "PST").alias('t')).collect()
[Row(t=datetime.datetime(1997, 2, 28, 2, 30))]
The timezone here is provided as a string literal ("PST"). If one were to have the following data structure:
+--------------------------+---------+
| utc_time |timezone |
+--------------------------+---------+
| 2018-08-03T23:27:30.000Z| PST |
| 2018-08-03T23:27:30.000Z| GMT |
| 2018-08-03T23:27:30.000Z| SGT |
+--------------------------+---------+
How could one achieve the following new column (preferably without a UDF)?
+--------------------------+-----------------------------------+
| utc_time |timezone | local_time |
+--------------------------+-----------------------------------+
| 2018-08-03T23:27:30.000Z| PST | 2018-08-03T15:27:30.000 |
| 2018-08-03T23:27:30.000Z| GMT | 2018-08-04T00:27:30.000 |
| 2018-08-03T23:27:30.000Z| SGT | 2018-08-04T07:27:30.000 |
+--------------------------+-----------------------------------+