2

It seems that I can't make date_format work. Using a format that I know work on my data (see below)

import org.apache.spark.sql.functions._
dat.withColumn("ts", date_format(dat("timestamp"), "MMM-dd-yyyy hh:mm:ss:SSS a (z)")).select("timestamp", "ts").first

I get

res310: org.apache.spark.sql.Row = [Aug-11-2016 09:21:43:749 PM (CEST),null]

Reading the doc I understand that date_format should accept any SimpleDateFormat. Is that correct?

I can make it work going through the pain of the code below:

val timestamp_parser = new SimpleDateFormat("MMM-dd-yyyy hh:mm:ss:SSS a (z)")
val udf_timestamp_string_to_long = udf[Long, String]( timestamp_parser.parse(_).getTime() )
val udf_timestamp_long_to_sql_timestamp = udf[Timestamp, Long]( new Timestamp(_) )
dat.withColumn("ts", udf_timestamp_long_to_sql_timestamp(udf_timestamp_string_to_long(dat("timestamp")))).select("timestamp", "ts").first

which gives

res314: org.apache.spark.sql.Row = [Aug-11-2016 09:21:43:749 PM (CEST),2016-08-11 21:21:43.749]
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Cedric H.
  • 7,980
  • 10
  • 55
  • 82
  • 1
    I am not sure if [this](http://stackoverflow.com/questions/39281152/spark-unix-timestamp-data-type-mismatch/39281631#39281631) can help at all... – gsamaras Sep 02 '16 at 16:30
  • 3
    Possible duplicate of [Better way to convert a string field into timestamp in Spark](http://stackoverflow.com/questions/29844144/better-way-to-convert-a-string-field-into-timestamp-in-spark) – zero323 Sep 02 '16 at 16:31
  • 2
    If you want to convert string to timestamp field you're using a wrong function. `date_format` is used to create formated string, not to parse dates. – zero323 Sep 02 '16 at 16:32
  • @zero323 Hmmm that's embarrassing. – Cedric H. Sep 02 '16 at 16:41
  • 1
    @CedricH. Happens to the best of us :) – zero323 Sep 02 '16 at 16:47

0 Answers0