4

I'm trying to use pandas.read_excel to read in .xls files. It succeeds on most of my .xls files, but then for some it errors out with the following error message:

Unsupported format, or corrupt file: Expected BOF record; found '\x00\x05\x16\x07\x00\x02\x00\x00'

I've been trying to research why this is happening to some, but not all files. The xlrd version is 1.0.0. I tried to manually read in with xlrd.open_workbook and I get the same result.

Does anyone know what file type, this BOF record is referring to?

DLee
  • 7,856
  • 4
  • 19
  • 23
  • 1
    Do the files actually open in Excel/OO? If they do - if you save them out again, can you then read tem using `xlrd`? – Jon Clements Aug 15 '17 at 20:06
  • 1
    I opened with Excel (which works fine) and saved the file. Reading with xlrd I get the same error code with BOF record ('\x00\x05\x16\x07\x00\x02\x00\x00') – DLee Aug 15 '17 at 20:16

3 Answers3

6

There are various reasons to why that error message appeared. However, the main reason could be due to the Excel file itself. Sometimes, especially if you're pulling an Excel file from some Reporting Portal, the Excel file could be corrupt so the best thing would be to open the Excel file and save it as a new .xls file then retry running pandas.read_excel.

Lemme know if it works.

Naufal
  • 1,203
  • 14
  • 12
2

I solved this problem loading it with pd.read_table (it loads everything into one column)

df = pd.read_table('path/to/xls_file/' + 'my_file.xls')

then I split this column with

df = df['column_name'].str.split("your_separator", expand=True)
Ekat Sim
  • 115
  • 6
-2

Please check if you have given the right extension of the file either xlsx or csv. a wrong extension specified of the file may cause this issue.