1

It's my first time to use pandas, I have multiple excel files, that i want to combine all into one Excel file using python pandas.

I managed to merge the content of the first sheets in each excel file into one sheet in a new excel file like this shown in the figure below: combined sheets in one sheet

I wrote this code to implement this:

import glob
import pandas as pd
path = "C:/folder"
file_identifier = "*.xls"
all_data = pd.DataFrame()
for f in glob.glob(path + "/*" + file_identifier):
   df = pd.read_excel(f)
   all_data = all_data.append(df,ignore_index=True)

writer = pd.ExcelWriter('combined.xls', engine='xlsxwriter')    
all_data.to_excel(writer, sheet_name='Summary Sheet') 
writer.save()
file_df = pd.read_excel("C:/folder/combined.xls")
# Keep only FIRST record from set of duplicates
file_df_first_record = file_df.drop_duplicates(subset=["Test summary", "Unnamed: 1", "Unnamed: 2", 
"Unnamed: 3"], keep="first")
file_df_first_record.to_excel("filtered.xls", index=False, sheet_name='Summary Sheet')

But I have two issues:

  1. How to remove cells that has "Unnamed" as shown in the previous figure
  2. How to copy other worksheets (the second worksheet in each Excel file, not the first worksheet) from all other Excel files and put it in one Excel file with multiple worksheets and with different students names like shown in the picture.

all worksheets in one excel file

So i managed to combine worksheet1 from all Excel files in one sheet, but now I want to copy A, B, C, D, E worksheets into one Excel file that has all other remaining worksheets in other Excel files.

Each Excel file of the ones I have looks like this single excel file

john adam
  • 15
  • 5
  • I fixed the first point, but now Not sure how to get the worksheets from the other Excel files. – john adam Jun 01 '21 at 09:19
  • try : https://stackoverflow.com/questions/55700544/python-efficient-way-to-append-all-worksheets-in-multiple-excel-into-pandas-data?rq=1 & https://www.statology.org/write-multiple-excel-sheets-pandas/ – p._phidot_ Jun 02 '21 at 04:24

1 Answers1

1

If you want to have all data gathered together in one worksheet you can use the following script:

  1. Put all excel workbooks (i.e. excel files) to be processed into a folder (see variable paths).

  2. Get the paths of all workbooks in that folder using glob.glob.

  3. Return all worksheets of each workbook with read_excel(path, sheet_name=None) and prepare them for merging.

  4. Merge all worksheets with pd.concat.

  5. Export the final output to_excel.

    import pandas as pd
    import glob
    
    paths = glob.glob(r"C:\excelfiles\*.xlsx")
    path_save = r"finished.xlsx"
    
    df_lst = [pd.read_excel(path, sheet_name=None).values() for path in paths]
    df_lst = [y.transpose().reset_index().transpose() for x in df_lst for y in x]
    df_result = pd.concat(df_lst, ignore_index=True)
    df_result.to_excel(path_save, index=False, header=False)
    
mouwsy
  • 1,457
  • 12
  • 20