0

I'm trying to read in some excel files but they seem to have formatting errors. I can't understand the error messages when I try to read the files and have no idea how to fix these. Please help.

I've tried reading in a new excel document I've just created and have had no problems.

df = pd.read_excel('2013-ID19.xls', index_col=0)
print(df.head())

File "exportDataToPandas.py", line 7, in <module>
    pd.read_excel('2013-ID19.xls', index_col=0)
  File "/Users/cj/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "/Users/cj/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "/Users/cj/anaconda3/lib/python3.7/site-packages/pandas/io/excel.py", line 307, in read_excel
    io = ExcelFile(io, engine=engine)
  File "/Users/cj/anaconda3/lib/python3.7/site-packages/pandas/io/excel.py", line 394, in __init__
    self.book = xlrd.open_workbook(self._io)
  File "/Users/cj/anaconda3/lib/python3.7/site-packages/xlrd/__init__.py", line 157, in open_workbook
    ragged_rows=ragged_rows,
  File "/Users/cj/anaconda3/lib/python3.7/site-packages/xlrd/book.py", line 92, in open_workbook_xls
    biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
  File "/Users/cj/anaconda3/lib/python3.7/site-packages/xlrd/book.py", line 1278, in getbof
    bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
  File "/Users/cj/anaconda3/lib/python3.7/site-packages/xlrd/book.py", line 1272, in bof_error
    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<table><'
Ayoub ZAROU
  • 2,387
  • 6
  • 20

2 Answers2

0

You probably saved the file in a wrong format. Open the xls file and save as .csv instead. Something is corrupted in the original file.

M.F
  • 345
  • 3
  • 15
0

Apparently this is an upstream issue with the xlrd library, open here. As the issue is still unresolved, there might not be a straight up answer. A few options:

  1. Try running directly from xlrd
import xlrd
book = xlrd.open_workbook("2013-ID19.xls")
  1. if it does not work, try updating your xlrd version.

  2. If you have access to the file, perhaps try to save as a newer version of Excel, as suggested in this answer:

What you have is most probably an "XML Spreadsheet 2003 (*.xml)" file ... You can also check this by opening the file with Excel and then click on Save As and look at the file-type that is displayed. While you are there, save it as an XLS file so that your xlrd can read it

realr
  • 3,652
  • 6
  • 23
  • 34
  • Thank you both for replying. I have the most up to date versions of python and xlrd. I've tried saving both as an excel workbook (.xslx) and .csv but I'm still getting the same error messages. I can open the files and look at them in excel without a problem. Any more ideas? I'm on a mac if that makes a difference? – user11815898 Jul 21 '19 at 19:40
  • I would try to use some tweaks when reading the file using `pd.read_csv` to see if it is a specific cell value that is causing the issue, such as `nrows`, `skipfooter`, `skiprows` and `encoding='utf8'`. Maybe by skipping the last row[s] or first row[s] it works well – realr Jul 21 '19 at 19:47
  • @user11815898 would [this answer](https://stackoverflow.com/questions/9623029/python-xlrd-unsupported-format-or-corrupt-file) be applicable to you? – realr Jul 21 '19 at 19:51
  • Fab - seems to be working now. Really appreciate your help! – user11815898 Jul 21 '19 at 19:54
  • Would you mind sharing what worked in your case? Thanks ! – realr Jul 21 '19 at 19:57
  • 1
    Converting to .csv format and remembering to change from read_excel to read_csv (rookie error!) – user11815898 Jul 21 '19 at 21:11