0

In Python, I am reading a .xlsb file with pandas. However, the date column is being read as a five-digit integer rather than an actual date. Whenever I open the xlsb file and save the sheet as a separate xlsx file, it reads the date column fine. However, I would like to automate this whole process so opening and saving multiple sheets into xlsx files would be in-efficient.

Overall, I do not care if the final type is either datetime or string. I just dont want this integer. Below is my code:

# Import Libraries
from pdb import pm
import pandas as pd
import pyxlsb
from sqlalchemy import create_engine
import urllib
import pyodbc


# Read Excel File
df = pd.read_excel(r'\\filepath\filename.xlsb', sheet_name='my_sheet', index_col=0, engine='pyxlsb')

df['Date']
Date
45291
44561
44561
43830
44196
df['Date'].dtype()

returns

dtype('int64')

Here is the actual dates when I save the sheet separately and import as an xlsx file:

      12/31/2023
1      12/31/2021
2      12/31/2021
3      12/31/2019
4      12/31/2020
          ...    
405    08/31/2023
406    09/30/2023
407    09/30/2023
408    08/31/2023
409    12/31/2023
Josh
  • 43
  • 1
  • 7

1 Answers1

0

Have you tried,

pd.to_datetime('your_datetime_columns',origin='1899-12-30')

This will give you your wanted format, also this is a duplicate so i flagged your question.

INGl0R1AM0R1
  • 1,532
  • 5
  • 16