I am trying to read some excel files in pandas. In some files, the table of interest is not perfectly formatted, i.e. multiple rows are formatted as a single row but each such row has multiple lines. So the data appears fine when you view the excel file. Also when parsing it using pandas, there is indeed a newline character (\n) at the end of each such line.
The problem is that when I read it with read_excel() function, it converts it into a DataFrame which does not consider this line break as a separate row but puts it into one row with \n in it. I would like to write a code that treats/converts each such row with N lines as N rows (using the line-breaks as an indicator for new row).
Is there a way to do it either while parsing the file or post-processing the dataframe in Python?
Here I provide a very simplified version of my dummy excel-file and some code to explain the problem.
Sample Excel-File:
Name | Price
-------------------------------
Coca Cola | 46.66
-------------------------------
Google | 1204.44
Facebook | 177.58
-------------------------------
Berkshire Hathaway | 306513.75
I simply use Pandas' read_excel in Python:
dataframe_parsed = pandas.read_excel(file_name)
print(dataframe_parsed.head())
I get the following DataFrame as output:
Name Price
0 Coca Cola 46.66
1 Google\nFacebook 1204.44\n177.58
2 Berkshire Hathaway 306513.75
The desired output is:
Name Price
0 Coca Cola 46.66
1 Google 1204.44
2 Facebook 177.58
3 Berkshire Hathaway 306513.75
Any help will be highly appreciated.