Sample Code:
val sparkSession = SparkUtil.getSparkSession("timestamp_format_test")
import sparkSession.implicits._
val format = "yyyy/MM/dd HH:mm:ss.SSS"
val time = "2018/12/21 08:07:36.927"
val df = sparkSession.sparkContext.parallelize(Seq(time)).toDF("in_timestamp")
val df2 = df.withColumn("out_timestamp", to_timestamp(df.col("in_timestamp"), format))
Output:
df2.show(false)
plz notice: out_timestamp loses the milli-second part from the original value
+-----------------------+-------------------+
|in_timestamp |out_timestamp |
+-----------------------+-------------------+
|2018/12/21 08:07:36.927|2018-12-21 08:07:36|
+-----------------------+-------------------+
df2.printSchema()
root
|-- in_timestamp: string (nullable = true)
|-- out_timestamp: timestamp (nullable = true)
In the above result: in_timestamp is of string type, and I would like to convert to timestamp data type, it does get convert but the millisecond part gets lost. Any idea.? Thanks.!