25

I'm new to Spark SQL and am trying to convert a string to a timestamp in a spark data frame. I have a string that looks like '2017-08-01T02:26:59.000Z' in a column called time_string

My code to convert this string to timestamp is

CAST (time_string AS Timestamp)

But this gives me a timestamp of 2017-07-31 19:26:59

Why is it changing the time? Is there a way to do this without changing the time?

Thanks for any help!

ZygD
  • 22,092
  • 39
  • 79
  • 102
Jessica
  • 721
  • 1
  • 6
  • 13

3 Answers3

28

You could use unix_timestamp function to convert the utc formatted date to timestamp

val df2 = Seq(("a3fac", "2017-08-01T02:26:59.000Z")).toDF("id", "eventTime")

df2.withColumn("eventTime1", unix_timestamp($"eventTime", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'").cast(TimestampType))

Output:

+-------------+---------------------+
|userid       |eventTime            |
+-------------+---------------------+
|a3fac        |2017-08-01 02:26:59.0|
+-------------+---------------------+

Hope this helps!

koiralo
  • 22,594
  • 6
  • 51
  • 72
7

Solution on Java

There are some Spark SQL functions which let you to play with the date format.

Conversion example : 20181224091530 -> 2018-12-24 09:15:30

Solution (Spark SQL statement) :

SELECT
 ...
 to_timestamp(cast(DECIMAL_DATE as string),'yyyyMMddHHmmss') as `TIME STAMP DATE`,
 ...
FROM some_table

You can use the SQL statements by using an instance of org.apache.spark.sql.SparkSession. For example if you want to execute an sql statement, Spark provide the following solution:

...
// You have to create an instance of SparkSession
sparkSession.sql(sqlStatement); 
...

Notes:

  • You have to convert the decimal to string and after you can achieve the parsing to timestamp format
  • You can play with the format the get however format you want...
RazvanParautiu
  • 2,805
  • 2
  • 18
  • 21
  • My case, used this `to_timestamp(cast(timestamp as string),'yyyyMMddHHmmss')` output coming as null only, where as my timestamp column value is `2019/08/02 15:00:00`. let me know if I am missing anything else. – Indrajeet Gour Apr 25 '21 at 15:19
2
  1. In spark sql you can use to_timestamp and then format it as your requirement. select date_format(to_timestamp(,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as from

  2. Here 'timestamp' with value is 2019/02/23 12:00:00 and it is StringType column in 'event' table. To convert into TimestampType apply to_timestamp(timestamp, 'yyyy/MM/dd HH:mm:ss). It is need to make sure the format for timestamp is same as your column value. Then you apply date_format to convert it as per your requirement.

> select date_format(to_timestamp(timestamp,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as timeStamp from event