4

I have a column in my Dataframe timestamp, which contains UNIX 13-digit timestamps as follows:

|   timestamp   | 
| ------------- |
| 1584528257638 |
| 1586618807677 |
| 1585923477767 |
| 1583314882085 |

With pandas it is fairly easy to convert it like:

ms = pd.to_datetime(df[column], unit='ms')
df[column] = ms

However, in pySpark it is not that easy and I found some others, such as this post trying to achieve this goal. The concatenation of the last Milliseconds does not work for me, it always results in the Second timestamp (HH:mm:ss) instead of HH:mm:ss.SSS.

What I tried so far was:

df = df.withColumn("unix_timestamp", F.unix_timestamp(df.timestamp,'yyyy-MM-dd HH:mm:ss.SSS z') + F.substring(df.timestamp, -3,3).cast('float')/1000)

df = df.withColumn("ms_Timestamp", F.to_timestamp(df["unix_timestamp"]))

This unfortunately, didn't convert it into a millisecond timestamp and I have no idea what else to do.

I would appreciate any help to finally get a millisecond timestamp.

All the best and thanks in advance.

lenlehm
  • 67
  • 1
  • 9
  • Can you try usng `df = df.withColumn("timestamp", F.to_timestamp(F.col("timestamp")))`? – pissall Apr 16 '20 at 09:12
  • That is unfortunately not working for me. Results in the normal seconds timestamp and discards the milliseconds. – lenlehm Apr 17 '20 at 15:40

1 Answers1

2

Default to_timestamp, from_unixtime, unix_timestamp functions will not result milliseconds.

But to get workaround use from_unixtime and concat functions to get the timestamp with millisecs.

#using substring function
df.withColumn("unix_timestamp", concat_ws(".",from_unixtime(substring(col("timestamp"),0,10),"yyyy-MM-dd HH:mm:ss"),substring(col("timestamp"),-3,3))).show(10,False)

#using divide function
df.withColumn("unix_timestamp", concat_ws(".",from_unixtime((col("timestamp")/1000),"yyyy-MM-dd HH:mm:ss"),substring(col("timestamp"),-3,3))).show(10,False)
#+-------------+-----------------------+
#|timestamp    |unix_timestamp         |
#+-------------+-----------------------+
#|1584528257638|2020-03-18 05:44:17.638|
#|1586618807677|2020-04-11 10:26:47.677|
#|1585923477767|2020-04-03 09:17:57.767|
#|1583314882085|2020-03-04 03:41:22.085|
#+-------------+-----------------------+
notNull
  • 30,258
  • 4
  • 35
  • 50