170

Please see my code below:

import pandas
df = pandas.read_excel('cat.xlsx')

After running that, it gives me the following error:

Traceback (most recent call last):
  File "d:\OneDrive\桌面\practice.py", line 4, in <module>
    df = pandas.read_excel('cat.xlsx')
  File "D:\python\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "D:\python\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)
  File "D:\python\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
    self._reader = self._engines[engine](self._io)
  File "D:\python\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
    super().__init__(filepath_or_buffer)
  File "D:\python\lib\site-packages\pandas\io\excel\_base.py", line 353, in __init__
    self.book = self.load_workbook(filepath_or_buffer)
  File "D:\python\lib\site-packages\pandas\io\excel\_xlrd.py", line 37, in load_workbook
    return open_workbook(filepath_or_buffer)
  File "D:\python\lib\site-packages\xlrd\__init__.py", line 170, in open_workbook
    raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
xlrd.biffh.XLRDError: Excel xlsx file; not supported

I tried uninstall and reinstall Pandas with the pip command. The error persists. I have xlrd 2.0.1 and Pandas 1.1.5 installed.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
LNQ
  • 2,587
  • 2
  • 5
  • 11
  • 1
    @LNQ can you add the commands you used to the question. Also need your `pandas` version and `xlrd` version, do `pip freeze | findstr pandas` and `pip freeze | findstr xlrd`. Expecting you are using python3. – รยקคгรђשค Dec 11 '20 at 11:36
  • 4
    @LNQ Read the warning at https://pypi.org/project/xlrd/. I think they dropped support for anything other than `xls` format. You can use a different engine like `openpyxl`. – รยקคгรђשค Dec 11 '20 at 11:53
  • 2
    @รยקคгรђשค oh, thanks. I reinstalled an older version of xlrd and it worked. – LNQ Dec 11 '20 at 11:59
  • 3
    in my case, upgrading `pandas` to `1.2.0` solved the issue (I have `xlrd` 2.0.1 ) – jjrr Jan 17 '21 at 18:36

5 Answers5

264

As noted in the release email, linked to from the release tweet and noted in large orange warning that appears on the front page of the documentation, and less orange but still present in the readme on the repo and the release on pypi:

xlrd has explicitly removed support for anything other than xls files.

This is due to potential security vulnerabilities relating to the use of xlrd version 1.2 or earlier for reading .xlsx files.

In your case, the solution is to:

  • make sure you are on a recent version of pandas, at least 1.0.1, and preferably the latest release.
  • install openpyxl: https://openpyxl.readthedocs.io/en/stable/
  • change your pandas code to be:
    pandas.read_excel('cat.xlsx', engine='openpyxl')
    

Edit: Currently, pandas >= 1.2 addresses this issue. (Release Notes)

flutefreak7
  • 2,321
  • 5
  • 29
  • 39
Chris Withers
  • 10,837
  • 4
  • 33
  • 51
  • 79
    This is a prime example on how versions should *not* be released. This change breaks a lot of existing code, and I don't recall ever seeing a deprecation warning with xlrd reading xlsx files. If I were you, I would quickly release a new version that does support xlsx files, and gives a very clear deprecation warning saying that the *next* version will not support it. This will give people time to, you know, fix their systems before they break. People don't usually read package readmes or listen for tweets on packages that read excel files. This is not the proper way of notifying them. – zmbq Dec 14 '20 at 08:19
  • 8
    There was a deprecation warning on the whole library for over a year, and an announcement about this four years ago. Reading documentation and mailing list announcements is important for just this type of issue. The changes needed here are trivial, especially in light on the potential security vulnerabilities. – Chris Withers Dec 14 '20 at 08:56
  • 9
    @ChrisWithers sorry and thanks for all your hard work. Please understand that your library was mainly used as a dependency, and we don't go scouring the pages of every dependency, that's why the visibilty of your messages were low. We can simply pass `engine="openpyxl"` to `pd.ExcelFile`, but your library parses excel files in different ways and code was written to take that into account which means it is not trivial to fix. The warnings should have been in decorators in code. I wish I knew, but I did not. Again thanks for your work. Would support. – nurettin Dec 21 '20 at 09:57
  • 1
    @ChrisWithers I clicked on all the above links and didn't find an explanation of the security risk. Only the advice that "xlrd has become unreliable in Python 3.9". This would seem to suggest I'm ok using it with Python 3.7 for a while yet. Although I create most of the files I read, I would be curious to know about the nature of the security risk. – adr Jan 15 '21 at 15:36
  • 5
    openpyxl as a ton of quirks, it's a monumental pain to work with. If the excel file has DateTime columns that are not wide enough, excel replace the output by a template "XXXX"... and openpyxl tries to return the template object instead of the data, which crashed everything. It is but one of the many issues I discovered working with openpyxl... No solutions except manual modifications on files, which is a big no-no with big data. – Florian Fasmeyer Mar 25 '21 at 08:14
73

The latest version of xlrd (2.0.1) only supports .xls files.

If you are prepared to risk potential security vulnerabilities, and risk incorrect parsing of certain files, this error can be solved by installing an older version of xlrd.

Use the command below in a shell or cmd prompt:

pip install xlrd==1.2.0
Chris Withers
  • 10,837
  • 4
  • 33
  • 51
LNQ
  • 2,587
  • 2
  • 5
  • 11
  • 62
    How very incredibly useful to have an excel module that doesn't support excel files. – pianoman102 Dec 12 '20 at 00:17
  • 4
    @painoman102: perhaps you could read the README, or the release notes of the package, or the release email, to see why? – Chris Withers Dec 12 '20 at 15:11
  • 30
    Just to be clear, as the author of this package, I can safely state that this is an incredibly dangerous suggestion. The reason xlsx support was removed is because it had potential security vulnerabilities and no-one was maintaining it. If you choose this approach, rather than the trivial switch to openpyxl, you are risking exposure to these. – Chris Withers Dec 14 '20 at 08:59
  • @ChrisWithers that's right, this way works. – Elnaz Jan 01 '21 at 17:59
  • 5
    @ChrisWithers Unfortunately, openpyxl does not appear to work at all with the excel files I am working with. It is completely unable to parse and always returns an empty dataframe. – user3756718 Jan 13 '21 at 23:13
  • Dear @ChrisWithers thanks for your contributions to this community!, but I had to choose this solution since `openpyxl`didn't give me the same results parsing some Excles in the middle of several processes embedded in a large number of line codes. It is obviously not a great solution!! but as you may understand I wouldn't risk losing the functionality of a the whole software which has been programmed by other developers in this case! – Elias Jan 15 '21 at 14:30
  • 4
    Going back to `xlrd==1.2.0` version worked. Thanks https://exerror.com/xlrd-biffh-xlrderror-excel-xlsx-file-not-supported/ – Moosa Sharieff Mar 27 '21 at 20:37
  • Sorry, I am using `v2.0.1` and it doesn't support. – Coddy Jul 04 '21 at 03:15
  • this works for me with no problem @user3756718 df = pd.read_excel('file.xlsx', sheet_name = 'mysheet', engine='openpyxl') pandas v. 1.1.3, openpyxl 3.0.5 – Chuck Jan 07 '22 at 20:43
2

The latest version of Pandas supports xlsx files. I had this error on 1.1.4 and after upgrading to 1.3.5

pip install --upgrade pandas

I no longer have the XLRDError. Older versions also support xlsx files.

Sam
  • 105
  • 1
  • 6
1

Best way is to probably make openpyxl you're default reader for read_excel() in case you have old code that broke because of this update.

You can do it by changing the default values of the method by going to the _base.py inside the environment's pandas folder. You can find it as follows:

import pandas as pd
print(pd.__file__)

Open the file and find

def read_excel(...)

You will find the default value for engine. Change it to 'openpyxl'

Original tip/answer here: https://stackoverflow.com/a/69577391/7151338

LuisSilva
  • 234
  • 1
  • 4
  • 13
1

I had the same problem using the ExcelFile constructor (for a file containing multiple worksheets) instead of the read_excel method. In that case the solution is:

import pandas

xlsx = pandas.ExcelFile('cat.xlsx', engine='openpyxl')
avandeursen
  • 8,458
  • 3
  • 41
  • 51
  • Please you could have provided more details rather than just posting here how to import and open the file. Am I wrong? – Maf Aug 06 '22 at 14:04
  • 2
    The answer tells you that the `engine` parameter can also be set in the constructor `ExcelFile(…)`, and not just in the `read_excel` method. Took me some time to figure that out, so I added that as an answer here, for others who use the constructor to read `xlsx` files instead of `xls` files. – avandeursen Aug 13 '22 at 13:53
  • 1
    Your answer is fine. Sorry. I wasn't very familiar with this before. – Maf Aug 13 '22 at 17:25