I have a dataframe containing a path to an excel file, a sheet name and an id, each in one column:
df = pd.DataFrame([['00100', 'one.xlsx', 'sheet1'],
['00100', 'two.xlsx', 'sheet2'],
['05300', 'thr.xlsx', 'sheet3'],
['95687', 'fou.xlsx', 'sheet4'],
['05300', 'fiv.xlsx', 'sheet5']],
columns=['id', 'file', 'sheet'])
This dataframe looks like:
id file sheet
0 00100 c:\one.xlsx sheet1
1 00100 c:\two.xlsx sheet2
2 05300 c:\thr.xlsx sheet3
3 95687 c:\fou.xlsx sheet4
4 05300 c:\fiv.xlsx sheet5
I made a function to use with apply, which will read each file and return a dataframe.
def getdata(row):
file = row['file']
sheet = row['sheet']
id = row['id']
tempdf = pd.ExcelFile(file) # Used on purpose
tempdf = tempdf.parse(sheet) # Used on purpose
tempdf['ID'] = id
return tempdf
I then use apply over the initial dataframe so it will return a dataset for each row. The problem is, I don't know how to store the dataframes created in this way.
I tried to store the dataframes in a new column, but the column stores None:
df['data'] = df.apply(getdata, axis=1)
I tried to create a dictionary but the ways that came to my mind were plain useless:
results = {df.apply(getdata, axis=1)} # for this one, in the function I tried to return id, tempdf
In the end, I ended converting the 'id' column to an index to iterate over it in the following way:
for id in df.index:
df[id] = getdata(df.loc[id], id)
But I want to know if there was a way to store the resulting dataframes without using an iterator.
Thanks for your feedback.