I have a dataframe like below:
+------+----------+----+
|ID | date |flag|
+------+----------+----+
|123456|2015-04-21|null|
|234567|2017-04-18|null|
|345678|2009-06-25|null|
|456789|2001-11-07|null|
|567890|2016-10-02|null|
+------+----------+----+
I am trying to modify the dataframe to change the dates in the date column to show as 'YYYY-mm-01' like below.
+------+----------+----+
|ID | date |flag|
+------+----------+----+
|123456|2015-04-01|null|
|234567|2017-04-01|null|
|345678|2009-06-01|null|
|456789|2001-11-01|null|
|567890|2016-10-01|null|
+------+----------+----+
I am trying to do so like this:
df = df.withColumn("date", f.trunc("date", "month"))
But it looks as if it's messing up the date and making all the dates the same date. How can I change my pyspark column elements from their original YYYY-mm-dd to YYYY-mm-01 for every row?