It's my first time to use pandas, I have multiple excel files, that i want to combine all into one Excel file using python pandas.
I managed to merge the content of the first sheets in each excel file into one sheet in a new excel file like this shown in the figure below: combined sheets in one sheet
I wrote this code to implement this:
import glob
import pandas as pd
path = "C:/folder"
file_identifier = "*.xls"
all_data = pd.DataFrame()
for f in glob.glob(path + "/*" + file_identifier):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
writer = pd.ExcelWriter('combined.xls', engine='xlsxwriter')
all_data.to_excel(writer, sheet_name='Summary Sheet')
writer.save()
file_df = pd.read_excel("C:/folder/combined.xls")
# Keep only FIRST record from set of duplicates
file_df_first_record = file_df.drop_duplicates(subset=["Test summary", "Unnamed: 1", "Unnamed: 2",
"Unnamed: 3"], keep="first")
file_df_first_record.to_excel("filtered.xls", index=False, sheet_name='Summary Sheet')
But I have two issues:
- How to remove cells that has "Unnamed" as shown in the previous figure
- How to copy other worksheets (the second worksheet in each Excel file, not the first worksheet) from all other Excel files and put it in one Excel file with multiple worksheets and with different students names like shown in the picture.
all worksheets in one excel file
So i managed to combine worksheet1 from all Excel files in one sheet, but now I want to copy A, B, C, D, E worksheets into one Excel file that has all other remaining worksheets in other Excel files.
Each Excel file of the ones I have looks like this single excel file