1

I'm trying to read in a .xlsx file into a dataframe. The .xlsx opened in Excel looks like:

Heading 1 Heading 2 Heading 3
soda 12 4
pop 12 2
cola 12 3

But the dataframe I read in using:

df = pd.read_excel("fileName.xlsx", engine='openpyxl')

looks like this:

Heading 1 Heading 2 Heading 3
soda 0 4
pop 0 2
cola 0 3

Does anyone know why all the values in the second heading are replaced with 0? Or any way I could get around it? I'm not sure considering that the standard read_csv no longer supports .xlsx so I've been using the Openpyxl version. Thanks in advance!

EDIT: So xlrd no longer supports .xlsx either (only .xls) so I installed an older version of xlrd, but I get the same result as seen above using an older version of xlrd as I do using the current version of openpyxl. Still looking for help...

Also I meant to say read_excel in place of read_csv

Sameer
  • 59
  • 1
  • 1
  • 5

2 Answers2

0

You mentioned in your question that read_csv doesn't support .xlsx which is correct, however read_excel does support .xlsx. Since the latest version of xlrd - which previously was the standarad engine for .xlsx files - only openpyxl supports xlrd anymore.

If you run xlrd version 1.2.0 or older, to avoid the openpyxl engine simply try using :

df = pd.read_excel("fileName.xlsx")

Using this old version could expose you to security vulnerabilities however, which is why .xlsx support was discontinue in the first place.

The author ot the xlrd library does answer more indepth in this question's thred.

Nic Moetsch
  • 846
  • 6
  • 9
  • When I try doing this I get the error xlrd.biffh.XLRDError: Excel xlsx file; not supported – Sameer Apr 06 '21 at 14:53
  • I'm sorry, you were indeed correct, xlrd infact doesn't support .xlsx anymore because of security concerns. Maybe [this](https://stackoverflow.com/questions/65250207/pandas-cannot-open-an-excel-xlsx-file) post solves your problem. – Nic Moetsch Apr 07 '21 at 07:11
0

I had a similar problem and solved it by specifying the datatype of the column by using the argument 'converters'.

You might want to try

df = pd.read_excel("fileName.xlsx", engine='openpyxl', converters ={'Heading 2':int})

or

df = pd.read_excel("fileName.xlsx", engine='openpyxl', converters ={'Heading 2':str})
SmileyProd
  • 788
  • 4
  • 13