0

I have dataframe like below and using pyspark 2.4

Name    doj
kevin   08/15/2013
George  06/21/2014

df.printSchema()
 -- Name (String)
 -- dob (String)

I would like to convert doj to YYYY-MM-DD format and make sure i need to convert the doj to Datetype instead of String using pyspark.Is there any specific function available ? I appreciate your response

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
N9909
  • 189
  • 1
  • 2
  • 23

2 Answers2

1

Use to_date() function.

df.show()
#+------+----------+
#|  Name|       doj|
#+------+----------+
#| Kevin|08/15/2013|
#|George|06/21/2014|
#+------+----------+

from pyspark.sql.functions import *

df.withColumn("doj",to_date(col("doj"),'MM/dd/yyyy')).show()
#+------+----------+
#|  Name|       doj|
#+------+----------+
#| Kevin|2013-08-15|
#|George|2014-06-21|
#+------+----------+
df.withColumn("doj",to_date(col("doj"),'MM/dd/yyyy')).printSchema()
#root
# |-- Name: string (nullable = true)
# |-- doj: date (nullable = true)
notNull
  • 30,258
  • 4
  • 35
  • 50
0
def dateconv(x):
        if x == None:
            x = 'null'
            return x
        else:
            return x.strftime('%Y-%M-%D')
dateconv(doj)

Something similar in python, I did this

ashakshan
  • 419
  • 1
  • 5
  • 17
  • possible duplicate of this https://stackoverflow.com/questions/38080748/convert-pyspark-string-to-date-format – ashakshan Sep 09 '20 at 22:29