1

My datetime is in following format:

           visit_dts |web_datetime|
+--------------------+------------+
| 5/1/2018 3:48:14 PM|        null|

Based on answer provided here, I am using following query to convert string into datetime format:

web1 = web1.withColumn("web_datetime", from_unixtime(unix_timestamp(col("visit_dts"), "%mm/%dd/%YY %I:%M:%S %p")))

But it is not working. Any lead would be great.

Ronak Jain
  • 3,073
  • 1
  • 11
  • 17
saurav shekhar
  • 596
  • 1
  • 6
  • 17

2 Answers2

3

You can do like below to achieve your result

from pyspark.sql import Row

df = sc.parallelize([Row(visit_dts='5/1/2018 3:48:14 PM')]).toDF()

import pyspark.sql.functions as f

web = df.withColumn("web_datetime", f.from_unixtime(f.unix_timestamp("visit_dts",'MM/dd/yyyy hh:mm:ss aa'),'MM/dd/yyyy HH:mm:ss'))

This should give you

web.show()

+-------------------+-------------------+
|          visit_dts|       web_datetime|
+-------------------+-------------------+
|5/1/2018 3:48:14 PM|05/01/2018 15:48:14|
+-------------------+-------------------+
User12345
  • 5,180
  • 14
  • 58
  • 105
  • In spark 3.x the format for AM/PM has changed from `aa` to `a`. So for example the expression in this answer would now look like ```web = df.withColumn("web_datetime", f.from_unixtime(f.unix_timestamp("visit_dts",'MM/dd/yyyy hh:mm:ss a'),'MM/dd/yyyy HH:mm:ss'))``` – the_cat_lady Jan 26 '23 at 16:10
3

This works perfect for me

from pyspark.sql.functions import to_timestamp
df=spark.read.csv(fp,header=True)
df=df.withColumn('time',to_timestamp("Date","MM/dd/yyyy hh:mm:ss a"))
df.select("Case Number",'time','Date').show(5,False)

This gives

+-----------+-------------------+----------------------+
|Case Number|time               |Date                  |
+-----------+-------------------+----------------------+
|HM558301   |2004-01-01 09:00:00|01/01/2004 09:00:00 AM|
|HM559628   |2001-07-01 00:01:00|07/01/2001 12:01:00 AM|
|HM561700   |2002-08-01 12:00:00|08/01/2002 12:00:00 PM|
|HM566703   |2004-07-13 12:00:00|07/13/2004 12:00:00 PM|
|HM541269   |2004-07-01 06:00:00|07/01/2004 06:00:00 AM|
+-----------+-------------------+----------------------+
only showing top 5 rows