I have a folder of a few hundred Excel files all organized identically with nine sheets in each workbook. I am running the following code to iterate over the files and create one dataframe for each worksheet across all workbooks (so dataframe "sheet_a_df" will be sheet "a" from every workbook concatenated into a single dataframe).
sheet_a_df = pd.DataFrame()
for file in glob.glob('C:\\Users\*.xlsx'):
df = pd.read_excel(file,sheetname='a')
sheet_1_df = sheet_1_df.append(df,ignore_index=True).dropna()
sheet_b_df = pd.DataFrame()
for file in glob.glob('C:\\Users\\*.xlsx'):
df = pd.read_excel(file,sheetname='b')
sheet_b_df = sheet_b_df.append(df,ignore_index=True).dropna()
# And so on for all nine sheet names...
However, this requires copy and pasting the code nine times (once for each sheet).
Is there a more appropriate way to do this?
Reviewing this question, I understand dictionaries are the way to go for creating multiple dataframes in a for loop. I am also trying to name each df according to the worksheet's name. I created a list of my sheet names and tried the following code, but am getting a KeyError that simply returns the first sheet's name.
sheet_names = ['a',
'b',
'c',
...,]
df_dict = {}
for file in glob.glob('C:\\Users\*.xlsx'):
for sheet in sheet_names:
df = pd.read_excel(file,sheetname=sheet)
df_dict[sheet] = df_dict[sheet].append(df)
Is there a way to fix the above code to create all nine dfs while naming them according to the sheets they come from?