0

Here is my test code, that gives my error:

import pandas

file = 'KBART EDINA.xlsx'
data = list()

with open(file, 'r') as xl_file:
    df = pandas.read_excel(xl_file)

When I run it, I get the following error:

❯ python hack.py
Traceback (most recent call last):
  File "hack.py", line 11, in <module>
    df = pandas.read_excel(xl_file)
  File "/CWD/deliberately/obscured/.direnv/python-3.7.6/lib/python3.7/site-packages/pandas/util/_decorators.py", line 299, in wrapper
    return func(*args, **kwargs)
  File "/CWD/deliberately/obscured/.direnv/python-3.7.6/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 336, in read_excel
    io = ExcelFile(io, storage_options=storage_options, engine=engine)
  File "/CWD/deliberately/obscured/.direnv/python-3.7.6/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 1072, in __init__
    content=path_or_buffer, storage_options=storage_options
  File "/CWD/deliberately/obscured/.direnv/python-3.7.6/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 954, in inspect_excel_format
    buf = stream.read(PEEK_SIZE)
  File "/another/obscured/path/miniconda3/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 16: invalid start byte

I've tried with 6 different xls, xlsx, and ods files..... and they all return the same error

I have the following (relevant) libraries installed:

openpyxl          3.0.7
pandas            1.2.4
xlrd              2.0.1

I know the file(s) are readable (I had a if not os.path.isfile(file): print("####") clause to prove that)

.... what am I missing?

CodeGorilla
  • 811
  • 1
  • 6
  • 21
  • The error isn't that the file handle is readable or not. The error occurs because when reading, it hits an invalid codepoint. Imagine a number cipher where A -> 1, B -> 2, etc. If you have an input which is zero, what letter do you get back? That's basically what a `UnicodeDecodeError` means here. – ifly6 Jun 21 '21 at 14:58
  • 2
    Does this answer your question? [Pandas read \_excel: 'utf-8' codec can't decode byte 0xa8 in position 14: invalid start byte](https://stackoverflow.com/questions/48647122/pandas-read-excel-utf-8-codec-cant-decode-byte-0xa8-in-position-14-invalid). I'm aware that the specific byte is different. See whether replacing the input file handle with the raw path helps per https://stackoverflow.com/a/53123720/2741091. – ifly6 Jun 21 '21 at 15:00

1 Answers1

0

Error between seat & keyboard:

Wrong:

import pandas

file = 'KBART EDINA.xlsx'

with open(file, 'r') as xl_file:
    df = pandas.read_excel(xl_file)

Right

import pandas

file = 'KBART EDINA.xlsx'

df = pandas.read_excel(file)
CodeGorilla
  • 811
  • 1
  • 6
  • 21