I have some columns with dates from a source files that look like 4/23/19
The 4 being the month, the 23 being the day and the 19 being 2019
How do I convert this to a timestamp in pyspark?
So far
def ParseDateFromFormats(col, formats):
return coalesce(*[to_timestamp(col, f) for f in formats])
df2 = df2.withColumn("_" + field.columnName, ParseDateFromFormats(df2[field.columnName], ["dd/MM/yyyy hh:mm", "dd/MM/yyyy", "dd-MMM-yy"]).cast(field.simpleTypeName))
There doesn't seem to be a date format that would work