I have some excel files with this configuration.
Excel sheet that needs to be written to a pandas dataframe
That is, instead of writing Header3/Header31 to column D, Header4/Header41 to column E, and Header5/Header51 to column F, the user wrote in the way shown in the Figure. Note that the users are the same for all variables (headers) and a blank space separate the next 'batch' of headers. Is there any way to read this file as a pandas dataframe? I can't just correct the excel file manually, then read in Python, as there are many files and many columns.
Intended way of the dataframe to be shown in a pandas dataframe
I have been doing a research through Stack Overflow to try to find an answer and I found some interesting similar problems, but not quite what I need to do.
Link 1 This one has different headers through the sheet, but the OP needed only a specific part of dataframe. The accepted answer used open(file)
and read the file until the correct header was found. I thought about doing something like this, but how could I stop the reading column by column,as I need them all, not only one and then pass to read_excel
.
Link2 This one also is similar, but the OP needed to not include data after the 'break' occur. That is, when the blank line separating the other headers occurs, the loop is ended and results go to read_excel
.
I think the way to go will be through file reading with open
instead of going directly to read_excel
but how to read column by column?
Thank you!