I have timestamps in millisecond format and need to convert them from system time to UTC. Anyways...when doing the transformation spark gobbles my milliseconds and just shows them as zeros.
Short example:
from pyspark import Row
from pyspark import SparkContext
from pyspark.sql.functions import to_timestamp, date_format
spark = SparkContext.getOrCreate()
test = spark.createDataFrame([Row(timestamp = "2018-03-24 14:37:12,133")])
test_2 = test.withColumn('timestamp_2', to_timestamp('timestamp', 'yyyy-MM-dd HH:mm:ss,SSS'))
test_3 = test_2.withColumn('timestamp_3', date_format('timestamp_2', 'yyyy-MM-dd HH:mm:ss,SSS'))
test_3.write.option('header', True).csv('something')
This will result in:
timestamp,timestamp_2,timestamp_3
"2018-03-24 14:37:12,133",2018-03-24T14:37:12.000+01:00,"2018-03-24 14:37:12,000"
Can I somehow preserve the milliseconds?
I am using python 3.6.4 and spark version 2.3.2.