1

My question is about how to convert a Unicode date / time string to python datetime in pyspark

I have written a machine learning program using pyspark in a Databricks / AWS environment. All my code works well except when converting a Unicode string (u’4/6/2017 13:25’) to python datetime. I want to determine the difference in time between today and the purchase date.

My code is:

historicalE = historicalD.withColumn('new_purchase_date', f.date_format(historicalD.purchase_date.cast(dataType=t.TimestampType()), "%m-%d-%YT%H:%MZ"))

After running the code, the “new_purchase_date” = None

In another attempt, I tried:

historicalE = historicalD.withColumn('new_purchase_date', datetime.datetime.strptime(historicalD.purchase_date, '%m-%d-%YT%H:%M'))

This caused an interrupt must be a string, not a column

I have worked on this problem using several solutions for a day and am not making any progress. Your suggestions are very appreciated. Thanks.

Hashir Malik
  • 798
  • 2
  • 9
  • 27
  • 1
    Possible duplicate of [Convert pyspark string to date format](https://stackoverflow.com/questions/38080748/convert-pyspark-string-to-date-format) – pault Feb 05 '19 at 16:52
  • is it enough just `purchase_date.cast(dataType=t.TimestampType())`? The reason you are receving `None` as a result is because there is a Date format mismatching with the one that you provided. The only cast should be enough and a more relaxed condition, let me know if you still have `None` ;) – Vzzarr Feb 05 '19 at 16:58
  • Thanks for your reply. I continue to get "None" using the following code: – Bill Bardwell Feb 07 '19 at 14:21
  • Thank you for your reply. Unfortunately, I am still getting none after using your suggested change. Also, for some reason, the purchase date is now a string vs. unicode string. The current format is: purchase_date='4/6/2017 13:25'. I ran your code with no format and tried with several modified formats but I continue to get None. All suggestions are welcome. – Bill Bardwell Feb 07 '19 at 23:34

0 Answers0