1

I am trying to read several excel files using pd.read_excel. However, there is error: UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 114-115: unexpected end of data.

So I tried to add encoding = "latin1" and here is the error: TypeError: read_excel() got an unexpected keyword argument 'encoding'.

When I saved xls as csv using Excel and then read csv with encoding="latin1" it works. However, I want to read xls directly without converting to csv. Is it possible to fix this issue? Thank you.

Edited: the importing works if using xlsx rather than xls.

Jason
  • 467
  • 2
  • 4
  • 12
  • Your file might have a different encoding than what you assume. See https://stackoverflow.com/a/63478895/6018688 and https://docs.python.org/3/library/codecs.html#standard-encodings – fabianegli Oct 09 '20 at 01:02

1 Answers1

0

This was changed in pandas 1.1.0. Encoding is no longer a parameter for read_excel().

read_excel() no longer takes **kwds arguments. This means that passing in the keyword argument chunksize now raises a TypeError (previously raised a NotImplementedError), while passing in the keyword argument encoding now raises a TypeError (GH34464)

You could try the following:

wb = xlrd.open_workbook(path, encoding_override='latin1')
df = pd.read_excel(wb)
noah
  • 2,616
  • 13
  • 27
  • thank you but there is still error `UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 114-115: unexpected end of data` – Jason Oct 09 '20 at 00:31
  • I was trying this but found out that xlrd is only for .xls files I have language accents in my excel, when reading from csv I can pass encoding='utf-8'. Why would Pandas remove this ability from reading excel files? – boardtc Jan 20 '21 at 19:40
  • Ability to read only .xls files is for security purposes. See https://stackoverflow.com/a/65266497/8217112 – noah Jan 20 '21 at 19:48
  • 1
    I just downgraded : pip install pandas==1.0.3 – AJ AJ Apr 25 '21 at 06:20