2

I have a 'some.xlsb' file with some 10 columns, out of which 2 are DateTime column.

When I load using pandas the date-time column is parsed in a different form.

Explanations:

where DateTime value corresponding to 4/10/2021 11:50:24 AM - read as 44296.5

Below is the code I tried.

goods_df = pd.read_excel('some.xlsb',
                   engine='pyxlsb', sheet_name='goods_df')

goods_df_header = goods_df.iloc[1]
goods_df.columns = goods_df_header #set the header row as the df header
goods_df= goods_df[2:]
goods_df.head(2)
Tamil Selvan
  • 1,600
  • 1
  • 9
  • 25
Danish
  • 2,719
  • 17
  • 32
  • 1
    you might want to do the parsing to datetime after import, see e.g. [Convert Excel style date with pandas](https://stackoverflow.com/q/38454403/10197418). – FObersteiner Jun 17 '21 at 08:34

1 Answers1

2

When you read xlsb file using pandas you will get excel time float value because xlsb convert datetime object into an float value before storing.

According to Microsoft 44296.5 means 44296.5 days passed since jan 1st 1900.

You need convert this into epoch and then date by using below formula( epoch value= number of sec passed since jan 1st 1970 00:00:00 ).

a = datetime.datetime.strftime((int(<datevalue from excel>)*86400)-2207520000, "%m/%d/%Y")

Or you can save this xlsb as xlsx and read it you will get exact datetime object.

ZygD
  • 22,092
  • 39
  • 79
  • 102
Aditya Sai
  • 21
  • 2
  • note, that this is not exact, because `2207520000` corresponds to 70 years in seconds assuming 365 days per year (which is not correct). The exact number one can find in https://stackoverflow.com/a/34721924/9640384 – Mischa Lisovyi Apr 20 '22 at 06:54