I am stuck in my project, where I have to convert Excel files coming from different clients and convert them to CSV while preserving the number and date formats as is. There can be different date formats in the same sheets, and I need to preserve them as is. Please find the detailed explanation below:
Source Excel:
Date_1 | Date_2 |
---|---|
08/15/2023 | 17-08-2023 |
08/16/2023 | 18-08-2023 |
I tried using pd.read_excel(file_name, dtype='str')
and converted them to CSV using pd.to_csv()
, but the results I got were:
Date_1 | Date_2 |
---|---|
2023-08-15 00:00:00 | 2023-08-17 00:00:00 |
2023-08-16 00:00:00 | 2023-08-18 00:00:00 |
If I used pd.read_excel()
without dtype argument, I would get the same result, without the time component (2023-08-15 and so on)
I also tried using openpyxl, but it doesn't support .xls files, and it reads the dates as floating point numbers.
Tried xlrd and csv libraries, but even these libraries read dates as floating point integers.
I understand that Excel saves dates as floating point numbers and applies formatting on top of it, but is there a way I can copy the exact format of the dates from XLS or XLSX files and convert them to CSV? Just like a copy paste operation without messing around with any formats of the data?