0

I have roughly 30 excel workbooks I need to combine into one. Each workbook has a variable number of sheets but the sheet I need to combine from each workbook is called "Output" and the format of the columns in this sheet is consistent.

I need to import the Output sheet from the first file, then append the remaining files and ignore the header row.

I have tried to do this using glob/pandas to no avail.

manandearth
  • 804
  • 1
  • 9
  • 25
Jesus
  • 1
  • 1
  • 1
  • 1

3 Answers3

3

You could use openpyxl. Here's a sketch of the code:

from openpyxl import load_workbook

compiled_wb = load_workbook(filename = 'yourfile1.xlsx')
compiled_ws = compiled['Output']

for i in range(1, 30):
    wb = load_workbook(filename = 'yourfile{}.xlsx'.format(i))
    ws = wb['Output']
    compiled_ws.append(ws.rows()[1:]) # ignore row 0

compiled_wb.save('compiled.xlsx')
Omni
  • 1,002
  • 6
  • 12
3

Method shown by Clinton c. Brownley in Foundations for Analytics with Python:

execute in shell indicating the path to the folder with excel files ( make sure the argument defining all_workbooks is correct) and then followed by the excel output file as follows:

python script.py <the /path/ to/ excel folder/> < your/ final/output.xlsx> 

script.py:

import pandas as pd
import sys
import os
import glob
input_path = sys.argv[1]
output_file = sys.argv[2]
all_workbooks = glob.glob(os.path.join(input_file, '*.xlsx'))
all_df = []
for workbook in all_workbooks:
    all_worksheets = pd.read_excel(workbook, sheetname='Output',    index_col=None)
    for worksheet, data in all_worksheets.items:
        all_df.append(data)
data_concatenated = pd.concat(all_df, axis=0, ignore_index=True)
writer = pd.ExcelWriter(output_file)
data_concatenated.to_excel(writer, sheetname='concatenated_Output',  index=False)
writer.save()
manandearth
  • 804
  • 1
  • 9
  • 25
  • Script errors: "all_worksheets.items()" instead of "all_worksheets.items" and axis=1 was needed for me on the pd.concat. "sheetname" must be "sheet_name" in the to_excel (at least recent version) and it should be "sheet_name" in the read_excel to avoid a warning. – Software Prophets Feb 27 '18 at 17:23
  • This Works like a charm with the help of @OutfastSource solution and also installing the xlrd package but it just puts everything side by on axis = 1 and below each other on axis = 0. I need it such that it copy pastes a worksheet then goes to the bottom-most line and then copy pastes the same worksheet from the different workbook in that same folder. So how do i tweak this code for performing the desired Function ?? – Nischaya Sharma Oct 30 '19 at 16:44
3

This will probably get down-voted because this isn't a Python answer, but honestly, I wouldn't use Python for this kind of task. I think you are far better off installing the AddIn below, and using that for the job.

enter image description here

https://www.rondebruin.nl/win/addins/rdbmerge.htm

Click 'Merge all files from the folder in the Files location selection' and click 'Use a Worksheet name' = 'Output', and finally, I think you want 'First cell'. Good luck!

ASH
  • 20,759
  • 19
  • 87
  • 200