Suppose I have a dataframe df
with a column birth_date
which has values ('123','5345',234345') etc.
I am reading the dataframe first from a csv using
df = sqlContext.read.csv('s3://path/to/file',header = TRUE)
Every column is read as StringType(), so I convert the
birth_date
column to LongType() first (I have to read it as LongType due to some other reasons, I know I can read it as Integer as well, but lets not go into that right now) using the following
df = df.withColumn('birth_date',df['birth_date'].cast(LongType()))
Now, how do I make birth_date
column to DateType as well as add the interger values the column holds, as the number of days with the date "1960-01-01"?
I tried using date_add method date_add using the following command, but I am very new to pyspark and dont understand how column operations behave differently, so I am stuck.
Here is what I tried to do:
df= df.withColumn('birth_date',date_add("1960-01-01",'birth_date'))
and I am getting this error
py4j.Py4JException: Method date_add([class org.apache.spark.sql.Column, class java.lang.String]) does not exist
All my operations are in Databricks pyspark, if it matters at all.