My question is about how to convert a Unicode date / time string to python datetime in pyspark
I have written a machine learning program using pyspark in a Databricks / AWS environment. All my code works well except when converting a Unicode string (u’4/6/2017 13:25’) to python datetime.
I want to determine the difference in time between today and the purchase date.
My code is:
historicalE = historicalD.withColumn('new_purchase_date', f.date_format(historicalD.purchase_date.cast(dataType=t.TimestampType()), "%m-%d-%YT%H:%MZ"))
After running the code, the “new_purchase_date” = None
In another attempt, I tried:
historicalE = historicalD.withColumn('new_purchase_date', datetime.datetime.strptime(historicalD.purchase_date, '%m-%d-%YT%H:%M'))
This caused an interrupt must be a string, not a column
I have worked on this problem using several solutions for a day and am not making any progress. Your suggestions are very appreciated. Thanks.