0

I am trying to learn Spark and I am reading a dataframe with a timestamp column using the unix_timestamp function as below:

  val columnName = "TIMESTAMPCOL"
  val sequence = Seq(2016-01-20 12:05:06.999)
  val dataframe = {
    sequence.toDF(columnName)
  }
  val typeDataframe = dataframe.withColumn(columnName, org.apache.spark.sql.functions.unix_timestamp($"TIMESTAMPCOL"))
  typeDataframe.show

This produces an output:

+------------+
|TIMESTAMPCOL|
+------------+
|  1453320306|
+------------+

How can I read it so that I don't lose the ms i.e the .999 part? I tried using unix_timestamp(col: Col, s: String) where s is the SimpleDateFormat, eg "yyyy-MM-dd hh:mm:ss", without any luck.

rgamber
  • 5,749
  • 10
  • 55
  • 99
  • `date_format` uses java simpledateformat internally so you will get full time in milli seconds as well. possible duplicated [of](http://stackoverflow.com/questions/41879125/handling-microseconds-in-spark-scala/41879869#41879869) – Ram Ghadiyaram Feb 14 '17 at 03:27
  • 2
    Possible duplicate of [Handling microseconds in Spark Scala](http://stackoverflow.com/questions/41879125/handling-microseconds-in-spark-scala) – Ram Ghadiyaram Feb 14 '17 at 03:27

1 Answers1

1

To retain the milliseconds use "yyyy-MM-dd HH:mm:ss.SSS" format. You can use date_format like below.

val typeDataframe = dataframe.withColumn(columnName, org.apache.spark.sql.functions.date_format($"TIMESTAMPCOL","yyyy-MM-dd HH:mm:ss.SSS"))
typeDataframe.show

This will give you

+-----------------------+
|TIMESTAMPCOL           |
+-----------------------+
|2016-01-20 12:05:06:999|
+-----------------------+
abaghel
  • 14,783
  • 2
  • 50
  • 66