Spark-sql difference between 'YYYY-MM-dd' and 'yyyy-MM-dd' date formats

Question

I have a use case to add some value to a timestamp column and derive a date from this. I'm trying to use from_unix and unix_timestamp functions. But it is showing different values with different formats.

select from_unixtime('1577582583',  'YYYY-MM-dd'), from_unixtime('1577582583',  'yyyy-MM-dd');
from_unixtime(CAST(1577582583 AS BIGINT), YYYY-MM-dd)   from_unixtime(CAST(1577582583 AS BIGINT), yyyy-MM-dd)
2020-12-29  2019-12-29

Can someone tell the difference between 'YYYY-MM-dd' format and 'yyyy-MM-dd' format?

Does this answer your question? [Difference between 'yy' and 'YY' in Java Time Pattern](https://stackoverflow.com/questions/59527605/difference-between-yy-and-yy-in-java-time-pattern) — user10938362, Feb 15 '20 at 09:44

score 1 · Answer 1 · answered Feb 15 '20 at 02:52

1

YYYY is a week date and yyyy is an ordinary date. You can check it there: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html https://en.wikipedia.org/wiki/ISO_week_date

answered Feb 15 '20 at 02:52

chlebek

2,431
1
8
20

Ram Ghadiyaram · Answer 2 · 2020-02-15T03:21:59.593

In this case no difference.... but normal way is yyyy not YYYY see the code FromUnixTimestamp here

I am getting the same values for both formats. I think internally catalyst is taking care even if you are passing wrong formats...

See this example to demonstrate the behaviour:

package com.examples

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

object UnixTimestamp extends App {

  val spark: SparkSession = SparkSession.builder()
    .master("local")
    .appName("UnixTimestamp")
    .getOrCreate()
  Logger.getLogger("org").setLevel(Level.WARN)

  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._



  //Convert unix timestamp to date
  val dateDF = Seq(1).toDF("seq").select(
    from_unixtime(unix_timestamp(), "MM-dd-yyyy").as("date_1"),
    from_unixtime(unix_timestamp(), "dd-MM-yyyy HH:mm:ss").as("date_2"),
    from_unixtime(unix_timestamp(), "yyyy-MM-dd").as("date_3"),
    from_unixtime(unix_timestamp(), "YYYY-MM-dd").as("date_4"),
    from_unixtime(lit(1577582583), "YYYY-MM-dd").as("yourdate1"), // your example date format 1
    from_unixtime(lit(1577582583), "yyyy-MM-dd").as("yourdate2")//// your example date format 2

  )
  dateDF.printSchema()
  dateDF.show(false)
  println("lets consider sql or hive way here of what you have done")
  spark.sql("select from_unixtime('1577582583',  'YYYY-MM-dd'), from_unixtime('1577582583',  'yyyy-MM-dd')").show
}

Result : Here yourdate1 (YYYY-MM-dd) and yourdate2 (yyyy-MM-dd) are 2 different formats

root
|-- date_1: string (nullable = true)
|-- date_2: string (nullable = true)
|-- date_3: string (nullable = true)
|-- date_4: string (nullable = true)
|-- yourdate1: string (nullable = true)
|-- yourdate2: string (nullable = true)

+----------+-------------------+----------+----------+----------+----------+
|date_1    |date_2             |date_3    |date_4    |yourdate1 |yourdate2 |
+----------+-------------------+----------+----------+----------+----------+
|02-14-2020|14-02-2020 21:19:53|2020-02-14|2020-02-14|2019-12-28|2019-12-28|
+----------+-------------------+----------+----------+----------+----------+

lets consider sql or hive way here of what you have done
+-----------------------------------------------------+-----------------------------------------------------+
|from_unixtime(CAST(1577582583 AS BIGINT), YYYY-MM-dd)|from_unixtime(CAST(1577582583 AS BIGINT), yyyy-MM-dd)|
+-----------------------------------------------------+-----------------------------------------------------+
|                                           2019-12-28|                                           2019-12-28|
+-----------------------------------------------------+-----------------------------------------------------+

Spark-sql difference between 'YYYY-MM-dd' and 'yyyy-MM-dd' date formats

2 Answers2