In Python, I am reading a .xlsb file with pandas. However, the date column is being read as a five-digit integer rather than an actual date. Whenever I open the xlsb file and save the sheet as a separate xlsx file, it reads the date column fine. However, I would like to automate this whole process so opening and saving multiple sheets into xlsx files would be in-efficient.
Overall, I do not care if the final type is either datetime or string. I just dont want this integer. Below is my code:
# Import Libraries
from pdb import pm
import pandas as pd
import pyxlsb
from sqlalchemy import create_engine
import urllib
import pyodbc
# Read Excel File
df = pd.read_excel(r'\\filepath\filename.xlsb', sheet_name='my_sheet', index_col=0, engine='pyxlsb')
df['Date']
Date |
---|
45291 |
44561 |
44561 |
43830 |
44196 |
df['Date'].dtype()
returns
dtype('int64')
Here is the actual dates when I save the sheet separately and import as an xlsx file:
12/31/2023
1 12/31/2021
2 12/31/2021
3 12/31/2019
4 12/31/2020
...
405 08/31/2023
406 09/30/2023
407 09/30/2023
408 08/31/2023
409 12/31/2023