The code snippet you are using is correct , however the date_format
you are using for parsing is not in line with Spark 3.x
Furthermore to handle inconsistent cases , like - 10/14/2000 and 11/3/1391 , with MM/dd/yyyy
, you can set the timeParserPolicy=LEGACY
as the below link is applicable for Spark 3.x
, more info about this can be found here
The available DateTime Patterns for Parsing can be found - https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
Data Prepartion
l = [("10/14/2000","12/4/1991","5/8/1991"), ("11/3/1391","1/26/1992","9/5/1992")]
df = sql.createDataFrame(l, ["first", 'second',"third"])
df.show()
+----------+---------+--------+
| first| second| third|
+----------+---------+--------+
|10/14/2000|12/4/1991|5/8/1991|
| 11/3/1391|1/26/1992|9/5/1992|
+----------+---------+--------+
To Date
df.select(F.col("first"),F.to_date(F.col("first"),"MM/dd/yyyy").alias("date")).show()
+----------+----------+
| first| date|
+----------+----------+
|10/14/2000|2000-10-14|
| 11/3/1391|1391-11-03|
+----------+----------+