Please first check schema of your table as field can be date or string.
# dateframe with date as string and date as date
df = (spark.createDataFrame([{"date_str": "2021-11-01", "date_str": "2021-11-02"}])
.withColumn("date_date", expr(" to_date(date_str) "))
)
df.show()
df.schema
>>Out[1]:
>>+----------+----------+
>>| date_str| date_date|
>>+----------+----------+
>>|2021-11-02|2021-11-02|
>>+----------+----------+
>>Out[2]: StructType(List(StructField(date_str,StringType,true),StructField(date_date,DateType,true)))
We can see above that both our string date and date as date object are as: YYYY-MM-DD. Let's now convert both to YYYYMMDD:
df_converted = (df
.withColumn("date_str_converted", expr(" date_format(to_date(date_str), 'yyyyMMdd') "))
.withColumn("date_date_converted", expr(" date_format(date_date, 'yyyyMMdd') "))
)
df_converted.show()
>>Out[3]:
>>+----------+----------+------------------+-------------------+
>>| date_str| date_date|date_str_converted|date_date_converted|
>>+----------+----------+------------------+-------------------+
>>|2021-11-02|2021-11-02| 20211102| 20211102|
>>+----------+----------+------------------+-------------------+