AssertionError with pandas when reading excel

Question

I'm trying to read an xlsx file into python using pandas.
I've done this thousands of times before but some reason it is not working with a particular file.

The file is downloaded from another source and I get an AssertionError (see end) when reading with pandas:

df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1")

The variable is defined for the path. The path exists (os.path.exists(path) returns True).

When I copy the contents of the file and paste the values in a new excel doc, this new one will open with the read_excel() method.

When I copy the contents of the file and paste the formatting in a new excel, this new one will open with the read_excel() method.

It doesn't seem to be the values or the formatting.

I am guessing this could be an encoding issue?
Thank you for any help.

    df1 = pandas.read_excel(snap1)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 163, in read_excel
    io = ExcelFile(io, engine=engine)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 206, in __init__
    self.book = xlrd.open_workbook(io)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
    ragged_rows=ragged_rows,
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 794, in open_workbook_2007_xml
    x12sheet.process_stream(zflo, heading)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 531, in own_process_stream
    self_do_row(elem)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 597, in do_row
    assert 0 <= self.rowx < X12_MAX_ROWS
AssertionError

based on the assertion check being `0 <= self.rowx < X12_MAX_ROWS` I'm guessing that the `rowx` is either negative (Idk why that would happen) or more then what ever `X12_MAX_ROWS` is, is your spreadsheet remarkably large? — Tadhg McDonald-Jensen, May 27 '16 at 14:57
then is it possible that there is a stray value way out of bounds in the file? what if you backup the content, select all and delete then paste back in the content? — Tadhg McDonald-Jensen, May 27 '16 at 15:01
I selected all when I copied to the other excel file.....when I try the read_table method instead it says something about a 0x89 found? Is this possibly a source of error? — Eoin, May 27 '16 at 15:04
THe error is as follows: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte — Eoin, May 27 '16 at 15:05
I had a similar error, and the xlrd-people seem to be [aware](https://github.com/python-excel/xlrd/issues/56). For me it had to do with the `row=` set to 0 in one of the underlying xml-files. I could solve it by unpacking the xlsx, correcting the value to 1 and rezipping the files. Now I'm looking for a more automatic solution — Maarten Fabré, Nov 30 '16 at 11:30
Since it is was an encoding error I faced, I used the `read_table()` method in tandem with the `encoding` parameter and "utf-16". This solved the issue as the text in one column was in a foreign language (Korean) and so the characters needed 16bit encoding rather than 8 — Eoin, Nov 30 '16 at 14:32
@TadhgMcDonald-Jensen What if my spreadsheet does have many rows? >1.1M. I am also facing the same error of assertion. I have no problems reading this file into R. — Jantje Houten, May 22 '21 at 19:03

score 3 · Answer 1 · answered Oct 02 '20 at 13:48

Look at your system for the file xlsx.py.

In your computer it's apparently at C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py

Search the line :

X12_MAX_ROWS = 2 ** 20

and change it so something like

X12_MAX_ROWS = 2 ** 22

This will push the limit of the number of lines from 1 million lines to 4 million lines.

Prutha Modak · Answer 2 · 2018-08-17T07:13:37.280

2

In my case, I was using xlrd package to read excel and I got the same error of Assertion. Open your xlrd package from site-packages, and from that open sheet.py (https://github.com/python-excel/xlrd/blob/master/xlrd/sheet.py)

Find this code in sheet.py

    if self.biff_version >= 80:
        self.utter_max_rows = 65536
    else:
        self.utter_max_rows = 16384

Convert the above one into...

 #if self.biff_version >= 80:
 self.utter_max_rows = 65536
 #else:
 #      self.utter_max_rows = 16384

Now try to run your program... Problem will be solved..:)

edited Aug 17 '18 at 07:13

answered Aug 17 '18 at 07:05

Prutha Modak

31
3

Any insight into how / what is happening here? – Yaakov Bressler Jun 10 '20 at 05:09

score 0 · Answer 3 · edited May 23 '17 at 12:24

Just for completeness, I had a similar problem where the row number of the first row was incorrect, I fixed my problem by changing xlsx-file with code adapted from this answer

def repair_broken_excelfile(zipfname, *filenames, new_name=None):
    # https://stackoverflow.com/a/4653863/1562285
    import tempfile
    import zipfile
    import shutil
    import os
    tempdir = tempfile.mkdtemp()
    try:
        tempname = os.path.join(tempdir, 'new.zip')
        with zipfile.ZipFile(zipfname, 'r') as zipread:
            with zipfile.ZipFile(tempname, 'w') as zipwrite:
                for item in zipread.infolist():
                    print('fn: ' + item.filename)
                    if item.filename not in filenames:
                        data = zipread.read(item.filename)

                        zipwrite.writestr(item, data)
                    else:

                        data = zipread.read(item.filename)
                        data = data.replace(b'<row r="0" spans="">', b'<row r="1" spans="">')
                        zipwrite.writestr(item, data)
                        pass
        if not new_name:
            new_name = zipfname
        shutil.move(tempname, new_name)
    finally:
        shutil.rmtree(tempdir)

Apparently there is a fix underway in xlrd

score 0 · Answer 4 · answered Jul 11 '19 at 16:53

0

Have encountered the same problem, I save file under xml format: "Save as type: XML Spreadsheet 2003" on window. Then I open the file and save as xlsx format. The new file no longer gives error message.

answered Jul 11 '19 at 16:53

ibn

1
1

score 0 · Accepted Answer · answered Mar 19 '20 at 08:23

0

The file contained Korean characters in the text. These needed alternative encoding. Using the "encoding" parameter in the read_excel() method resolved the issue.

df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1", encoding="utf-16")

answered Mar 19 '20 at 08:23

Eoin

357
1
4
20

there is no parameter encoding for read_excel method. why is this even selected as an answer – Abdulsalam Almahdi Jan 12 '23 at 14:09
You may be looking at the current pandas version which doesn't support it. – Eoin Jan 14 '23 at 08:33
Go back in time to the version I likely used (this should have been included on the original question) and the method accepts and passes key word arguments to the nested method. https://github.com/pandas-dev/pandas/blob/0.19.x/pandas/io/excel.py – Eoin Jan 14 '23 at 08:35

score 0 · Answer 6 · answered Aug 06 '20 at 12:30

0

Sometimes this can be resolved just by deleting the (blank) lines below your table in Excel.

answered Aug 06 '20 at 12:30

HannibalTheBarbarian

36
1
4

AssertionError with pandas when reading excel

6 Answers6

Linked