3

From the reference:

Convert time string with given pattern (‘yyyy-MM-dd HH:mm:ss’, by default) to Unix time stamp (in seconds), using the default timezone and the default locale, return null if fail.

I find that this drops milliseconds off DataFrame timestamp columns. I am just wondering whether it simply truncates, or rounds the timestamp to the nearest second.

Alex
  • 15,186
  • 15
  • 73
  • 127

1 Answers1

5

No documentation back up but in @spark 2.2.0, it's truncation, here is a demo:

from pyspark.sql import Row
import pyspark.sql.functions as F
r = Row('datetime')
lst = [r('2017-10-29 10:20:30.102'), r('2017-10-29 10:20:30.999')]

df = spark.createDataFrame(lst)

(df.withColumn('trunc_datetime', F.unix_timestamp(F.col('datetime')))
   .withColumn('seconds', F.from_unixtime(F.col('trunc_datetime'), 'ss'))
   .show(2, False))

+-----------------------+--------------+-------+
|datetime               |trunc_datetime|seconds|
+-----------------------+--------------+-------+
|2017-10-29 10:20:30.102|1509286830    |30     |
|2017-10-29 10:20:30.999|1509286830    |30     |
+-----------------------+--------------+-------+
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 1
    thanks! I was trying to make an example like this but I couldn't figure out how to make a dataframe with more than one row. (Confirmed the same behaviour in 2.1) – Alex Oct 30 '17 at 03:49