I would like to extract multiple tables of a series of excel spreadsheets where some sheets may contain more than one table, to store the tables separately as e.g. csv files. The table could be something like this:
if I read it using pandas read_excel
import pandas as pd
pd.read_excel('table_example.xlsx',header=None)
I would get something like this:
how could I extract the different tables? In my case tables have NaN values, which is potentially an additional complication.
[EDIT1] something similar to the excel tables can be generated using pandas:
df=pd.DataFrame(np.nan,index=range(0,10),columns=range(0,10))
df.iloc[1,2:5]=['t1h1','t1h2','t1h3']
df.iloc[2:5,2:5]=np.random.randn(3,3)
df.iloc[6,3:7]=['t2h1','t2h2','t2h3','t2h4']
df.iloc[7:9,3:7]=np.random.randn(2,4)
I have tried to find the limits of the tables using built-in pandas functions:
df[df.isnull().all(axis=1)]
I could use the first and second row to set an horizontal division and maybe make a first split but I don't know how to select the cells above or below the identified rows. Or even if this is the most convenient approach.
disclaimer: in my case the tables always have a tag in the row above the header, this is because these tables are read by a non-python software which uses them to identify where the table starts. I decided to not consider these tags to ask a more generic problem, that other people may encounter.