0

I would like to convert string in a column of DataFrame to date in pyspark.

l = [("10/14/2000","12/4/1991","5/8/1991"), ("11/3/1391","1/26/1992","9/5/1992")]
spark.createDataFrame(l).collect()
df = spark.createDataFrame(l, ["first", 'second',"third"])

df2 = df.select(col("first"),to_date(col("first"),"MM/dd/yyyy").alias("date"))
df3 = df.select(col("first"),to_date(col("first"),"%M/%d/%y").alias("date"))

I tried those codes above, but neither of them worked.

Could somebody help me to solve this issue?

Pornalfy
  • 3
  • 1

1 Answers1

0

The code snippet you are using is correct , however the date_format you are using for parsing is not in line with Spark 3.x

Furthermore to handle inconsistent cases , like - 10/14/2000 and 11/3/1391 , with MM/dd/yyyy , you can set the timeParserPolicy=LEGACY as the below link is applicable for Spark 3.x , more info about this can be found here

The available DateTime Patterns for Parsing can be found - https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

Data Prepartion

l = [("10/14/2000","12/4/1991","5/8/1991"), ("11/3/1391","1/26/1992","9/5/1992")]

df = sql.createDataFrame(l, ["first", 'second',"third"])

df.show()

+----------+---------+--------+
|     first|   second|   third|
+----------+---------+--------+
|10/14/2000|12/4/1991|5/8/1991|
| 11/3/1391|1/26/1992|9/5/1992|
+----------+---------+--------+

To Date

df.select(F.col("first"),F.to_date(F.col("first"),"MM/dd/yyyy").alias("date")).show()

+----------+----------+
|     first|      date|
+----------+----------+
|10/14/2000|2000-10-14|
| 11/3/1391|1391-11-03|
+----------+----------+
Vaebhav
  • 4,672
  • 1
  • 13
  • 33