I have an excelsheet with multiple tables on the same sheet. These tables are of varying column numbers and varying row numbers. The good thing is however that the table header has a background color, and the table contents are of white background.
I am wondering if I could read in each of these tables as separate dataframes using xlrd or some other package.
The approach that am thinking of right now is pretty lengthy and may not be ideal.
For instance :
import xlrd
book = xlrd.open_workbook("some.xls", formatting_info=True)
sheets = book.sheet_names()
for index, sh in enumerate(sheets):
sheet = book.sheet_by_index(index)
rows, cols = sheet.nrows, sheet.ncols
for row in range(rows):
for col in range(cols):
xfx = sheet.cell_xf_index(row, col)
xf = book.xf_list[xfx]
bgx = xf.background.pattern_colour_index
if bgx != 64:
Header_row = rownum
Then traverse this Header_row and fetch all the column values and have them as dataframe column names. then continue parsing rows of the first column until you encounter a blank cell or a row which has only one or two non-null cells.
As you can see this is getting little lengthy and may not be the optimal way.
Appreciate your help in how can quickly pull out all the tales as separate dataframes.