0

I tried reading excel files by the following code:

import os
import xlrd

files = os.listdir(".")[1:101]


for file in files:
    workbook = xlrd.open_workbook(file)

but I got an error message like this.

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\r\n\r\n\r\n'

So I tried opening the files one by one.

And I found files named like this are readable, "14.08.01-08.07.xlsx.xlsx" but files named like this format are not readable. "14.08.22-08.28.xlsx.xls"

So I opened the files and found that files with extension "xlsx.xls" have problem with encoding.

These files include Korean characters, so I tried opening them by changing encoding into utf-8, in vain.

In conclusion I think I cannot read xlsx.xls files because of the encoding problem.

Is there anyway to solve this sort of problem?

Giordano
  • 5,422
  • 3
  • 33
  • 49
Jay_094
  • 1
  • 1
  • Maybe [this](http://stackoverflow.com/questions/3511743/using-xlrd-to-read-excel-xls-file-containing-chinese-and-or-hindi-characters) helps? – lrnzcig Aug 24 '16 at 08:27
  • Are they `xls` or `xlsx` format - strange file extension naming going on there... – Jon Clements Aug 24 '16 at 08:30

1 Answers1

1

Try xlrd.open_workbook(file, encoding_override="utf-8")

Sergey Gornostaev
  • 7,596
  • 3
  • 27
  • 39