It seems that I can't make date_format
work. Using a format that I know work on my data (see below)
import org.apache.spark.sql.functions._
dat.withColumn("ts", date_format(dat("timestamp"), "MMM-dd-yyyy hh:mm:ss:SSS a (z)")).select("timestamp", "ts").first
I get
res310: org.apache.spark.sql.Row = [Aug-11-2016 09:21:43:749 PM (CEST),null]
Reading the doc I understand that date_format
should accept any SimpleDateFormat
. Is that correct?
I can make it work going through the pain of the code below:
val timestamp_parser = new SimpleDateFormat("MMM-dd-yyyy hh:mm:ss:SSS a (z)")
val udf_timestamp_string_to_long = udf[Long, String]( timestamp_parser.parse(_).getTime() )
val udf_timestamp_long_to_sql_timestamp = udf[Timestamp, Long]( new Timestamp(_) )
dat.withColumn("ts", udf_timestamp_long_to_sql_timestamp(udf_timestamp_string_to_long(dat("timestamp")))).select("timestamp", "ts").first
which gives
res314: org.apache.spark.sql.Row = [Aug-11-2016 09:21:43:749 PM (CEST),2016-08-11 21:21:43.749]