25

I am trying to open a xlsx file and just print the contents of it. I keep running into this error:

import xlrd
book = xlrd.open_workbook("file.xlsx")
print "The number of worksheets is", book.nsheets
print "Worksheet name(s):", book.sheet_names()
print

sh = book.sheet_by_index(0)

print sh.name, sh.nrows, sh.ncols
print

print "Cell D30 is", sh.cell_value(rowx=29, colx=3)
print

for rx in range(5):
    print sh.row(rx)
    print

It prints out this error

raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found    '\xff\xfeT\x00i\x00m\x00'

Thanks

user2353003
  • 522
  • 1
  • 7
  • 18

9 Answers9

47

If you use read_excel() to read a .csv you will get the error

XLRDError: Unsupported format, or corrupt file: Expected BOF record;

To read .csv one needs to use read_csv(), like this

df1= pd.read_csv("filename.csv")
Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83
Mike Chan
  • 767
  • 2
  • 8
  • 14
37

There is also a third reason. The case when the file is already open by Excel. It generates the same error.

Richard Erickson
  • 2,568
  • 8
  • 26
  • 39
BStew
  • 371
  • 3
  • 2
22

The error message relates to the BOF (Beginning of File) record of an XLS file. However, the example shows that you are trying to read an XLSX file.

There are 2 possible reasons for this:

  1. Your version of xlrd is old and doesn't support reading xlsx files.
  2. The XLSX file is encrypted and thus stored in the OLE Compound Document format, rather than a zip format, making it appear to xlrd as an older format XLS file.

Double check that you are in fact using a recent version of xlrd. Opening a new XLSX file with data in just one cell should verify that.

However, I would guess the you are encountering the second condition and that the file is encrypted since you state above that you are already using xlrd version 0.9.2.

XLSX files are encrypted if you explicitly apply a workbook password but also if you password protect some of the worksheet elements. As such it is possible to have an encrypted XLSX file even if you don't need a password to open it.

Update: See @BStew's, third, more probable, answer, that the file is open by Excel.

jmcnamara
  • 38,196
  • 6
  • 90
  • 108
15

You can get this error when the xlsx file is actually html; you can open it with a text editor to verify this. When I got this error I solved it using pandas:

import pandas as pd
df_list = pd.read_html('filename.xlsx')
df = pd.DataFrame(df_list[0])
Pluto
  • 816
  • 10
  • 9
3

In my case, someone gave me an Excel file ending with extension ".xls". I tried parsing it with xlrd, and got this error:

xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found "blar blar blar"

After working some time, I found that .xls file actually is a text file. The sender didn't bother to create a real Excel binary file but just put ".xls" to a text file.

Maybe it's worth opening the file with text editor to make sure it is an Excel file. This could have saved me one hour.

Ken
  • 157
  • 1
  • 1
  • 6
3

to anyone who is reading this post today, the following solution actually helped me. https://stackoverflow.com/a/46214958/9642876

The XLSX file that I was trying to read was created by a reporting software and it couldn't be read either by pandas or xlrd, but could open it in Microsoft Excel. I re-saved the file under a different name and now it both xlrd and pandas can read the file.

It may also work if you just re-save with the same name, although I haven't tested this.

0

In my case, the issue was with the shared folder itself.

CASE IN POINT: I have a shared folder on WIN2012 Server where the user drops the .xlsx file and then uses my python script to load that xlsx file into a database table.

Even though, the user deleted the old file and put in the file that was to be loaded, the BOF error kept mentioning a byte string and the name of the user in the byte string -- no where inside of the xlsx file in any worksheet was there the name of the user. On top of it, when I copied the .xlsx into a newly created folder and ran the script referencing that new folder, it worked.

So in the end, I deleted the shared folder and realized that 5 items got deleted even though only 1 item was visible to me and the user. I think it is down to my lack of windows administration skills but that was the culprit.

Ali Khan
  • 59
  • 1
  • 9
0

I got the same error message. It looks so weird to me because the script works for the xlsx files under another folder and the files are almost the same.

I still don't know why this happened. But finally, I copied all the excel files to another folder and the script worked. An option to try if none of the above suggestions works for you...

jxshen
  • 121
  • 1
  • 2
0

This also happens when the file used by script is also open in the background.

Sailendra Pinupolu
  • 1,038
  • 1
  • 10
  • 8