Reading formatted excel file in Python

Question

I having problem reading xlsx file using pandas. The file is formatted slighlty. Following is the file - sample.xlsx

I am using the following code in Python3:

>>> import pandas as pd
>>> file = pd.ExcelFile('sample.xlsx')
>>> file.sheet_names
>>> temp = file.parse('Named Insured')
>>> temp.shape

The shape shows 740,10 whereas the original file is quite different. The extracted data is also jumbled.

Your answer is not very well defined. What is different? And what is jumbled exactly? If you mean that the first rows look vague, you probably don't want the first rows since they are not tabled. This can be solved with: temp = file.parse('Named Insured', skiprows = 3, header = 0) — Bram Zijlstra, Jan 20 '18 at 10:23
try printing temp.head() and match it with the original file. It's different. — Akash Kumar, Jan 20 '18 at 18:42

ralf htp · Accepted Answer · 2018-01-20T10:36:52.197

0

the file is in 2007-2013 Excel XML format ( according to LibreOffice )

pandas.read_excel() is not working , see the Read Excel XML .xls file with pandas for analysis of the problem and possible solutions ...

edited Jan 20 '18 at 10:36

answered Jan 20 '18 at 10:16

ralf htp

sample.xlsx:1:2: not well-formed (invalid token) – Akash Kumar Jan 20 '18 at 11:23

1 Answers1