0

I have some columns with dates from a source files that look like 4/23/19

The 4 being the month, the 23 being the day and the 19 being 2019

How do I convert this to a timestamp in pyspark?

So far

def ParseDateFromFormats(col, formats):
  return coalesce(*[to_timestamp(col, f) for f in formats])

df2 = df2.withColumn("_" + field.columnName, ParseDateFromFormats(df2[field.columnName], ["dd/MM/yyyy hh:mm", "dd/MM/yyyy", "dd-MMM-yy"]).cast(field.simpleTypeName))

There doesn't seem to be a date format that would work

Tiger_Stripes
  • 485
  • 5
  • 17
  • 3
    Does this answer your question? [Convert pyspark string to date format](https://stackoverflow.com/questions/38080748/convert-pyspark-string-to-date-format) – pault Nov 14 '19 at 15:01

1 Answers1

0

The reason why your code didn't work might be cause you reversed days and months. This works:

from pyspark.sql.functions import to_date

time_df = spark.createDataFrame([('4/23/19',)], ['dt'])
time_df.withColumn('proper_date', to_date('dt', 'MM/dd/yy')).show()
+-------+-----------+
|     dt|proper_date|
+-------+-----------+
|4/23/19| 2019-04-23|
+-------+-----------+
Napoleon Borntoparty
  • 1,870
  • 1
  • 8
  • 28