Reading Excel file to Python fails due to number formatted as text

Question

I have a ton of Excel files that each have a column where numbers are formatted as text. Excel gives the error "the number in this cell is formatted as text or preceded by an apostrophe" - see the third column where the cells have a green triangle.

My goal is to open all of these files in Pandas without having to manually open each of them and convert the column to number. However, pd.read_excel() fails with the following xlrd error:

XLRDError: ZIP file contents not a known type of workbook

Unsurprisingly when I use xlrd directly: wb = xlrd.open_workbook(filename) I get the same error.

I also tried openpyxl: wb = openpyxl.load_workbook(filename), and it gives me this:

KeyError: "There is no item named 'xl/_rels/workbook.xml.rels' in the archive"

I confirmed that the file is openable by both pandas (xlrd) and openpyxl if I manually convert the column to number in excel and re-save the workbook.

Does anyone have any ideas?

Is there a chance you could upload a sample? I'm trying to produce the same error, but not quite getting it. — francisco sollima, Jan 08 '18 at 19:52
you've basically got to use "converters" while reading the excel into a dataframe. Read doc (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html) question seems pretty similar to an earlier post. (https://stackoverflow.com/questions/32591466/python-pandas-how-to-specify-data-types-when-reading-an-excel-file). — murphy1310, Jan 08 '18 at 19:54
Worth noting that the file is invalid because it's missing the workbook relationships file. — Charlie Clark, Jan 09 '18 at 07:46

score 3 · Accepted Answer · answered Jan 08 '18 at 20:02

3

Use "converters" while reading excel.
Docs.

Eg.:

df = pd.read_excel('yourfile.xlsx',sheetname='sheetname',header=0,converters={ "% Chg" : str })

answered Jan 08 '18 at 20:02

murphy1310

647
1
6
13

Reading Excel file to Python fails due to number formatted as text

1 Answers1