0

I'm trying to read binary Excel files using read_excel method in pandas with pyxlsb engine as below:

import pandas as pd
df = pd.read_excel('test.xlsb', engine='pyxlsb')

If the xlsb file is like this file (Right now, I'm sharing this file via WeTransfer, but if there is a better way to share files on StackOverflow, let me know), the returned dataframe is filled with NaN's. I suspected that it might be because the file was saved with active cell pointing at the empty cells after the data originally. So I tried this:

import pandas as pd
with open('test.xlsb', 'rb') as data:
    data.seek(0,0)
    df = pd.read_excel(data, engine='pyxlsb')

but it still doesn't seem to work. I also tried reading the data from byte number 0 (from the beginning), writing it into a new file, 'test_1.xlsb', and finally reading it with pandas, but that doesn't work.

with open('test.xlsb','rb') as data:
    data.seek(0,0)
    with open('test_1.xlsb','wb') as outfile:
        outfile.write(data.read())
df = pd.read_excel('test_1.xlsb', engine='pyxlsb')

If anyone has suggestion as to what might be going on and how to resolve it, I'd greatly appreciate the help.

user1330974
  • 2,500
  • 5
  • 32
  • 60
  • 1
    what do you mean returned with NaN ? if you are using Windows, I would suggest using `xlwings` which generally handles xlsb better than pyxslb – Umar.H May 17 '20 at 02:35
  • @Datanovice, I meant the resulting dataframe only has empty rows. I'll give `xlwings` a try tomorrow (I tried `pyxlsb` because that's what Pandas documentation suggested). Thank you. – user1330974 May 17 '20 at 02:54
  • 1
    Consider even a [`pyodbc` solution](https://stackoverflow.com/a/61124285/1422451) where you can query *any* Excel workbook type with SQL using `read_sql`. – Parfait May 17 '20 at 04:18
  • @Parfait Very interesting approach to read Excel files and I'll certainly be using it when I use Windows. For this problem above, my environment is a Azure Batch instance with Linux OS. Thank you for sharing this unique (and very likely performant) way to read large Excel files, which has been a pain to do in Pandas. – user1330974 May 17 '20 at 13:57

0 Answers0