0

I have a dataframe containing a path to an excel file, a sheet name and an id, each in one column:

df = pd.DataFrame([['00100', 'one.xlsx', 'sheet1'],
                   ['00100', 'two.xlsx', 'sheet2'],
                   ['05300', 'thr.xlsx', 'sheet3'],
                   ['95687', 'fou.xlsx', 'sheet4'],
                   ['05300', 'fiv.xlsx', 'sheet5']],
                  columns=['id', 'file', 'sheet'])

This dataframe looks like:

      id         file   sheet
0  00100  c:\one.xlsx  sheet1
1  00100  c:\two.xlsx  sheet2
2  05300  c:\thr.xlsx  sheet3
3  95687  c:\fou.xlsx  sheet4
4  05300  c:\fiv.xlsx  sheet5

I made a function to use with apply, which will read each file and return a dataframe.

 def getdata(row):
    file = row['file']
    sheet = row['sheet']
    id = row['id']
    tempdf = pd.ExcelFile(file)     # Used on purpose
    tempdf = tempdf.parse(sheet)    # Used on purpose
    tempdf['ID'] = id
    return tempdf

I then use apply over the initial dataframe so it will return a dataset for each row. The problem is, I don't know how to store the dataframes created in this way.

I tried to store the dataframes in a new column, but the column stores None:

df['data'] = df.apply(getdata, axis=1)

I tried to create a dictionary but the ways that came to my mind were plain useless:

results = {df.apply(getdata, axis=1)}  # for this one, in the function I tried to return id, tempdf

In the end, I ended converting the 'id' column to an index to iterate over it in the following way:

for id in df.index:
    df[id] = getdata(df.loc[id], id)

But I want to know if there was a way to store the resulting dataframes without using an iterator.

Thanks for your feedback.

0 Answers0