Does unix_timestamp truncate or round milliseconds?

Question

Convert time string with given pattern (‘yyyy-MM-dd HH:mm:ss’, by default) to Unix time stamp (in seconds), using the default timezone and the default locale, return null if fail.

I find that this drops milliseconds off DataFrame timestamp columns. I am just wondering whether it simply truncates, or rounds the timestamp to the nearest second.

Psidom · Answer 1 · 2017-10-30T03:53:05.630

No documentation back up but in @spark 2.2.0, it's truncation, here is a demo:

from pyspark.sql import Row
import pyspark.sql.functions as F
r = Row('datetime')
lst = [r('2017-10-29 10:20:30.102'), r('2017-10-29 10:20:30.999')]

df = spark.createDataFrame(lst)

(df.withColumn('trunc_datetime', F.unix_timestamp(F.col('datetime')))
   .withColumn('seconds', F.from_unixtime(F.col('trunc_datetime'), 'ss'))
   .show(2, False))

+-----------------------+--------------+-------+
|datetime               |trunc_datetime|seconds|
+-----------------------+--------------+-------+
|2017-10-29 10:20:30.102|1509286830    |30     |
|2017-10-29 10:20:30.999|1509286830    |30     |
+-----------------------+--------------+-------+

thanks! I was trying to make an example like this but I couldn't figure out how to make a dataframe with more than one row. (Confirmed the same behaviour in 2.1) — Alex, Oct 30 '17 at 03:49

Does unix_timestamp truncate or round milliseconds?

1 Answers1

Linked