0

I'm at a project where I need to download from a bank's webpage an excel sheet. When dowloaded and opened in excel it appears a sign warning the file is corrupted. I accept and it works perfectly. Nevertheless, when trying to read it on python

df = pd.read_excel("path/name.xls") 

an error pops up.

Unsupported format, or corrupt file: Expected BOF record

I've tried with different encodings such as:

df=pd.read_excel("path/name.xls",encoding='utf-16le')

but it still does not work.

I've tried also reading with read_table :

Error tokenizing data. C error: Expected 1 fields in line 6, saw 2

These are the first rows of the dataset. The following rows, have the same format as the last ones.

Screenshot of the excel

I'm quite new at python, and cannot manage to find the solution to this. Would help any assistance. Thanks

  • 1
    An XLS file is a binary file. You can't use an encoding, and you obviously can't use `.readlines`.. You must use `file1 = io.open(filename, 'rb')`. – Tim Roberts Mar 09 '21 at 01:50
  • Did you try answers here? https://stackoverflow.com/questions/9623029/python-xlrd-unsupported-format-or-corrupt-file Eg pd.read_html if the file is actually html You should post first lines of the raw file if it opens in text editor I got that by googling your error string, come on people! – spioter Mar 09 '21 at 13:51
  • @spioter yes. I've tried. No progress made at all – Jose Azadian Mar 09 '21 at 20:16

0 Answers0