I have a column 'start_date' which is an integer 37823. This happened when I used xlrd library to convert xlsx to csv. Hence '2003/07/21' got converted to 37823.
I have gone through xlrd documentation and I understand there are several ways to convert it to date. However, I need to convert this to date format using PySpark in AWS Glue ETL jobs. Any suggestions?
I tried using to_date, date_format functions, but nothing worked.