0

How can I convert this date to a date format such that I can eventually transform it into yyyy-MM-dd? Similar examples, Convert string of format MMM d yyyy hh:mm AM/PM to date using Pyspark, could not solve it.

df = spark.createDataFrame(sc.parallelize([
            ['Wed Sep 30 21:06:00 1998'],
            ['Fri Apr  1 08:37:00 2022'],
            ]),
                           ['Date'])

+--------------------+
|                Date|
+--------------------+
|Wed Sep 30 21:06:...|
|Fri Apr  1 08:37:...|
+--------------------+
# fail
df.withColumn('Date', F.to_date(F.col('Date'), "DDD MMM dd hh:mm:ss yyyy")).show()
John Stud
  • 1,506
  • 23
  • 46

1 Answers1

1

I think you are using wrong symbols for Day-Of-Week and Hour - try this one:

from pyspark.sql.functions import to_date

df = spark.createDataFrame([('Wed Sep 30 21:06:00 1998',), ('Fri Apr  1 08:37:00 2022',)], 'Date: string')
df.withColumn('Date', to_date('Date', "E MMM dd HH:mm:ss yyyy")).show()

+----------+
|      Date|
+----------+
|1998-09-30|
|2022-04-01|
+----------+
Bartosz Gajda
  • 984
  • 6
  • 14
  • Thanks this works with Spark > 3.0; what about modern Spark? – John Stud Nov 15 '22 at 20:12
  • You can set the Spark config `spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")` to maintain compatibility with Spark < 3.0. Related topic - https://stackoverflow.com/questions/62602720/string-to-date-migration-from-spark-2-0-to-3-0-gives-fail-to-recognize-eee-mmm – Bartosz Gajda Nov 16 '22 at 08:45