My Pandas is incorrectly reading values from a .xlsx file

Question

I'm trying to read in a .xlsx file into a dataframe. The .xlsx opened in Excel looks like:

Heading 1	Heading 2	Heading 3
soda	12	4
pop	12	2
cola	12	3

But the dataframe I read in using:

df = pd.read_excel("fileName.xlsx", engine='openpyxl')

looks like this:

Heading 1	Heading 2	Heading 3
soda	0	4
pop	0	2
cola	0	3

Does anyone know why all the values in the second heading are replaced with 0? Or any way I could get around it? I'm not sure considering that the standard read_csv no longer supports .xlsx so I've been using the Openpyxl version. Thanks in advance!

EDIT: So xlrd no longer supports .xlsx either (only .xls) so I installed an older version of xlrd, but I get the same result as seen above using an older version of xlrd as I do using the current version of openpyxl. Still looking for help...

Also I meant to say read_excel in place of read_csv

try converting your excel sheet in csv then use read_csv function OR install xlrd which is default engine for reading excel files — Hassam, Apr 06 '21 at 07:38
When I try with xlrd or without the engine parameter I get the error xlrd.biffh.XLRDError: Excel xlsx file; not supported — Sameer, Apr 06 '21 at 14:54

Nic Moetsch · Answer 1 · 2021-04-07T07:50:49.977

0

You mentioned in your question that read_csv doesn't support .xlsx which is correct, however read_excel does support .xlsx. Since the latest version of xlrd - which previously was the standarad engine for .xlsx files - only openpyxl supports xlrd anymore.

If you run xlrd version 1.2.0 or older, to avoid the openpyxl engine simply try using :

df = pd.read_excel("fileName.xlsx")

Using this old version could expose you to security vulnerabilities however, which is why .xlsx support was discontinue in the first place.

The author ot the xlrd library does answer more indepth in this question's thred.

edited Apr 07 '21 at 07:50

answered Apr 06 '21 at 06:17

Nic Moetsch

846
6
9

When I try doing this I get the error xlrd.biffh.XLRDError: Excel xlsx file; not supported – Sameer Apr 06 '21 at 14:53
I'm sorry, you were indeed correct, xlrd infact doesn't support .xlsx anymore because of security concerns. Maybe [this](https://stackoverflow.com/questions/65250207/pandas-cannot-open-an-excel-xlsx-file) post solves your problem. – Nic Moetsch Apr 07 '21 at 07:11

score 0 · Answer 2 · edited Oct 13 '21 at 13:28

0

I had a similar problem and solved it by specifying the datatype of the column by using the argument 'converters'.

You might want to try

df = pd.read_excel("fileName.xlsx", engine='openpyxl', converters ={'Heading 2':int})

or

df = pd.read_excel("fileName.xlsx", engine='openpyxl', converters ={'Heading 2':str})

edited Oct 13 '21 at 13:28

SmileyProd

788
4
13

answered Oct 13 '21 at 12:21

user17142594

1
1

My Pandas is incorrectly reading values from a .xlsx file

2 Answers2