0

How to convert YYYY-MM-DD into YYYYMMDD in spark sql. Example 2021-11-25 should be 20211121.

In SQL SERVER its easy with the help of convert () or Format (). How to do it in SPARK SQL?

I do not want to convert this into DataFrame or DF .Looking for an answer using spark sql.

This is not a duplicate question.I need to get this done using SQL Api not Pyspark!

user1254579
  • 3,901
  • 21
  • 65
  • 104
  • 1
    Remember that YYYY-MM-DD is the ISO date format. – jarlh Nov 25 '21 at 11:42
  • 1
    Does this answer your question? [Convert date from String to Date format in Dataframes](https://stackoverflow.com/questions/40763796/convert-date-from-string-to-date-format-in-dataframes), or even the [docs](https://spark.apache.org/docs/3.2.0/api/sql/index.html#date_format) – Luuk Nov 25 '21 at 11:44
  • sorry, all those are DataFrame manipulation.I m looking for the spark sql – user1254579 Nov 25 '21 at 12:29

1 Answers1

1

Please first check schema of your table as field can be date or string.

# dateframe with date as string and date as date
df = (spark.createDataFrame([{"date_str": "2021-11-01", "date_str": "2021-11-02"}])
      .withColumn("date_date", expr(" to_date(date_str) "))
     )

df.show()
df.schema

>>Out[1]:
>>+----------+----------+
>>|  date_str| date_date|
>>+----------+----------+
>>|2021-11-02|2021-11-02|
>>+----------+----------+
>>Out[2]: StructType(List(StructField(date_str,StringType,true),StructField(date_date,DateType,true)))

We can see above that both our string date and date as date object are as: YYYY-MM-DD. Let's now convert both to YYYYMMDD:

df_converted = (df
.withColumn("date_str_converted", expr(" date_format(to_date(date_str), 'yyyyMMdd') "))
.withColumn("date_date_converted", expr(" date_format(date_date, 'yyyyMMdd')  "))
               )
            
df_converted.show()
>>Out[3]:
>>+----------+----------+------------------+-------------------+
>>|  date_str| date_date|date_str_converted|date_date_converted|
>>+----------+----------+------------------+-------------------+
>>|2021-11-02|2021-11-02|          20211102|           20211102|
>>+----------+----------+------------------+-------------------+
Hubert Dudek
  • 1,666
  • 1
  • 13
  • 21
  • 1
    Thanks for the answer in Python spark.Can this be done using SQL API ? – user1254579 Nov 25 '21 at 12:32
  • 3
    yes that's why I used expr basically you need to put what is in expression to SELECT so for example SELECT date_format(to_date(date_str), 'yyyyMMdd') FROM df (just remember that you need to register before that table or view if you have nonsql dataframe) – Hubert Dudek Nov 25 '21 at 13:19