-1

I have a dataset which contains Multiple columns and rows.enter image description here

Currently, it's in String type enter image description hereAnd, I wanted to convert to a date-time format for further task.

I tried this below code which returns null

df = df.withColumn('Date_Time',df['Date_Time'].cast(TimestampType()))
df.show()

I tried some of the solutions from here, but none of them is working all, in the end, returns me null.

Convert pyspark string to date format

Mohit Sharma
  • 601
  • 6
  • 11

1 Answers1

3

Since your date format is non-standard, you need to use to_timestamp and specify the corresponding format:

import pyspark.sql.functions as F

df2 = df.withColumn('Date_Time', F.to_timestamp('Date_Time', 'dd/MM/yyyy hh:mm:ss a'))
mck
  • 40,932
  • 13
  • 35
  • 50
  • but why it's converted to this format "2019-11-01 17:00:00" when we selected dd/MM/yyyy ? Is there any way to convert to specify Date format? – Mohit Sharma Jan 15 '21 at 10:07
  • @MohitSharma if you want to specify the date format, you can use `F.date_format('Date_Time', 'dd/MM/yyyy')`, for example. Note that this will convert the column type to string type. Also note that timestamp types are internally stored as integers, and the format shown in `df.show()` does not represent how it is stored. – mck Jan 15 '21 at 10:09
  • Okay got it. Is there any way I can get the results with this format such as dd/MM/yyyy hh:mm:ss etc? – Mohit Sharma Jan 15 '21 at 10:11
  • Sure, just use `df2.withColumn('Date_Time', F.date_format('Date_Time', 'dd/MM/yyyy hh:mm:ss'))`. Note that you need to convert to timestamp type first, before you can use `date_format`. That's why I operated on `df2` in this comment. – mck Jan 15 '21 at 10:13