286

I am trying to read a macro-enabled Excel worksheet using pandas.read_excel with the xlrd library. It's running fine in local, but when I try to push the same into PCF, I am getting this error:

2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] df1=pd.read_excel(os.path.join(APP_PATH, os.path.join("Data", "aug_latest.xlsm")),sheet_name=None)

2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] return open_workbook(filepath_or_buffer)
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] File "/home/vcap/deps/0/python/lib/python3.8/site-packages/xlrd/__init__.py", line 170, in open_workbook
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
2020-12-11T21:09:53.441+05:30 [APP/PROC/WEB/0] [ERR] xlrd.biffh.XLRDError: Excel xlsx file; not supported

How can I resolve this error?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Vignesh K
  • 2,879
  • 2
  • 6
  • 6
  • 2
    Does this answer your question? [pandas cannot open xlsx file](https://stackoverflow.com/questions/65250207/pandas-cannot-open-xlsx-file) – Chris Withers Dec 13 '20 at 18:29

2 Answers2

506

As noted in the release email, linked to from the release tweet and noted in large orange warning that appears on the front page of the documentation, and less orange, but still present, in the readme on the repository and the release on pypi:

xlrd has explicitly removed support for anything other than xls files.

In your case, the solution is to:

  • make sure you are on a recent version of Pandas, at least 1.0.1, and preferably the latest release. 1.2 will make his even clearer.
  • install openpyxl: https://openpyxl.readthedocs.io/en/stable/
  • change your Pandas code to be:
    df1 = pd.read_excel(
         os.path.join(APP_PATH, "Data", "aug_latest.xlsm"),
         engine='openpyxl',
    )
    
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Chris Withers
  • 10,837
  • 4
  • 33
  • 51
  • what if you don't know the sheet name? can you pass this to pd.ExcelFile? – Christopher Turnbull Dec 13 '20 at 18:10
  • 15
    Chris, thanks for the xlrd update to support Python 3.9. However, this is a major change in the package with no deprecation warning, so I would suggest a more informative error message, e.g. clarifying when (date and version) xlrd dropped support for non-xls files. – khyox Dec 14 '20 at 02:05
  • 1
    @kyox - there was a notice on the repo for over a year and various announcements on the mailing list and elsewhere going back over four years. – Chris Withers Dec 14 '20 at 07:17
  • 2
    @ChristopherTurnbull specifying the sheet name is optional. If you omit it, the first sheet in the file will be opened. – data.dude Dec 14 '20 at 11:36
  • 1
    As per this pandas developer and the discussion above it [link](https://github.com/pandas-dev/pandas/issues/28547#issuecomment-743125066) apparently xlrd no longer supports .xlsx files. One should wait for the newest pandas version 1.2.0 or put the parameter read_excel(engine='openpyxl') – BrunoSE Dec 22 '20 at 15:27
  • 2
    I install pandas==1.1.4 and xlrd==1.2.0 – Kairat Koibagarov Jan 12 '21 at 08:08
  • 23
    Installing the module `pip install openpyxl` and including in all my read_excel functions the openpyxl engine `read_excel("my.xlsx",engine='openpyxl') ` saved my code and my time! Thank you so much @ChrisWithers! – Corina Roca Jan 13 '21 at 13:21
  • 9
    As a user who didn't actually KNOW pandas was using xlrd to open xlsx files, a deprecation warning coming from the code would have been REALLY useful... I can't read all of the mailing lists of all of the libraries that I might POSSIBLY be using, somewhere 3 layers deep in my code... – Brian Postow Jun 10 '21 at 18:29
  • 35
    Good answer, but the passive aggressive, condescending tone isn't helpful to the numerous less technical users of pandas. Like a grumpy TSA screener, you're assuming that every member of the public is as deeply familiar as you are with a piece of software. – JPKab Jun 23 '21 at 21:20
  • 2022 update - this answer is still correct, but the latest version of Pandas supports xlsx files. I had this error with version 1.1.4 but after upgrading to 1.3.5 the error is gone. `pip install --upgrade pandas` – Sam Oct 27 '22 at 15:36
268

The previous version, xlrd 1.2.0, may appear to work, but it could also expose you to potential security vulnerabilities. With that warning out of the way, if you still want to give it a go, type the following command:

pip install xlrd==1.2.0
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
tryhard
  • 2,817
  • 1
  • 3
  • 3
  • 60
    This is absolutely the wrong answer. Do not use xlrd for reading xlsx files, use https://openpyxl.readthedocs.io/en/stable/. – Chris Withers Dec 12 '20 at 14:38
  • 4
    @tryhard What do you mean by "potential security vulnerabilities"? – Ric S Dec 16 '20 at 09:15
  • 1
    @RicS - that was from my edit. .xlsx files are zip files containing xml, both zip and xml have well published security issues that xlrd did a poor job of addressing. – Chris Withers Dec 19 '20 at 07:55
  • 4
    @ChrisWithers why this decision instead of fixing support for xlsx? – robertspierre Dec 20 '20 at 15:36
  • 1
    Lower version of xlrd might have some vulnerabilities but some (old) libraries require this exact version of xlrd – DevX May 17 '21 at 06:48
  • I sincerely did not understand yet which security vulnerabilities you are talking about. – Maf Aug 06 '22 at 14:02
  • @Chris Withers - What if I have a file content (from `requests.get` download) but not an excel file? Now I must create a file instead of `xlrd` `file_contents=content`? – mirek Nov 04 '22 at 08:11