1

I have a Dataframe with a date column representing Unix timestamp in String format. The column values need to be formatted to a different String representation as below -

Input Dataframe

+----+----+-------------+
|name|code|         date|
+----+----+-------------+
|   A|   1|1545905416000|
|   B|   3|1545905416000|
|   C|   5|1545905416000|
+----+----+-------------+

Expected output Dataframe

+----+----+-------------+
|name|code|         date|
+----+----+-------------+
|   A|   1|   2018-12-27|
|   B|   3|   2018-12-27|
|   C|   5|   2018-12-27|
+----+----+-------------+

This didn't work as it is giving null for all values -

 peopleDFCsv.withColumn("formatted_date", 
            functions.date_format(functions.col("date"), "yyyy-MM-dd"))
            .show();
shriyog
  • 938
  • 1
  • 13
  • 26
  • Or [Scala: Spark SQL to_date(unix_timestamp) returning NULL](https://stackoverflow.com/q/40433065/10465355) – 10465355 Dec 27 '18 at 12:20

1 Answers1

2

The date_format function works with timestamp not milliseconds since epoch. Try doing the conversion with CAST:

df.withColumn("formatted_date",
  functions.date_format(expr("CAST(date/1000 AS TIMESTAMP)"), "yyyy-MM-dd"))
  .show()

//    Outputs:
//    +----+----+-------------+--------------+
//    |name|code|         date|formatted_date|
//    +----+----+-------------+--------------+
//    |   A|   1|1545905416000|    2018-12-27|
//    |   B|   3|1545905416000|    2018-12-27|
//    |   C|   5|1545905416000|    2018-12-27|
//    +----+----+-------------+--------------+

The reason why you are getting null is because date_format is interpreting your string as a timestamp such as "2018-12-27 11:10:16". The plain number of does not conform to the format it expects so it just returns null.

Frank Wilson
  • 3,192
  • 1
  • 20
  • 29