Combine excel files with pandas

Question

Want to combine a few excel files into One Excel File and in 1 worksheet.

Puesdo-code:

1 create new workbook to hold all new excel files

2 want to be able to run a loop which takes in excel files and converts them all into dataframes

3 writes all files into that newly created workbook skipping one column for each new file written next to it.

Inital Attempt

# Combining dataFrames in excel
excelName = input("Label your excel file: ")
xlsxAUTO = '.xlsx'
excelAutoNamed = excelName + xlsxAUTO

writer = pd.ExcelWriter(excelAutoNamed,engine='xlsxwriter')

wbook = 0

for excel_file in excel_files:
    
    Currentwb = load_workbook(excel_file)
    Currentsheet = Currentwb.worksheets[0]
    MaxCols = Currentsheet.max_column
    print(MaxCols)
    
    df2 = pd.DataFrame()
    df2 = pd.read_excel(excel_file)
    
    if wbook == 0:
        MaxCols = 0
        print(wbook)
    
    df2.to_excel(writer,sheet_name='Validation',startrow=0, startcol=MaxCols+10)
    writer.save()
    wbook = wbook+1
    print(wbook)

Note: the # of columns vary for each excel file, as such the parameter for startCol should be generated from the following:

1get the max column number/letter from the previous dataframe and store that in a temp variable. assign that to the startCol parameter.

2Once again, I dont want them stacked one over another, just next to each other, so the axis=1 trick may/maynot work.

EDIT for bold format changes original vs new original excel file vs newly generated excel file

Christian Karcher · Answer 1 · 2022-06-09T05:46:12.247

0

I would suggest first concatenating all data from the available excel files into a single dataframe and then writing that dataframe into a new excel file. This makes the bookkeeping of the current column obsolete.

# collect excel content into list of dataframes
data = []
for excel_file in excel_files:
    this_df = pd.read_excel(excel_file, engine="openpyxl")
    # add empty column for separation
    this_df[''] = ''
    data.append(this_df)

# concatenate dataframes horizontally
df = pd.concat(data, axis=1)
# save combined data to excel
df.to_excel(excelAutoNamed, index=False)

edited Jun 09 '22 at 05:46

answered Jun 08 '22 at 05:17

Christian Karcher

2,533
1
12
17

This worked fine but the header bolding that the two seperate dataframes initally had went away. Is it becuase one column was not skipped when appending? – Anirudh Gattu Jun 08 '22 at 15:22
The implementation above by Christian works fine but just needed to know how to tweak it such that there is a column gap when the next df is appended. Is there a more pythonic approach rather than concating an empty df in middle? @Christian Karcher – Anirudh Gattu Jun 09 '22 at 01:12
I see no other way then to either append an empty df or an empty column to the dataframe. I have updated my example with the latter case. Not sure what you meant with the "header bolding", are you talking about the reformating the column headers? See e.g. https://stackoverflow.com/questions/36694313/pandas-xlsxwriter-format-header – Christian Karcher Jun 09 '22 at 05:45
The original format of the excel file had bold for the column names and row number, but when the new excel file is generated the column names and row number are no longer bolded. I have inlcuded two images for reference. the 'oroginal formatting' picture shows how the bolding was initally in place and the 'newly generated excel file' picture shows how after the new excel file has been generated, the bold format has now disappeared. @Christian Karcher – Anirudh Gattu Jun 09 '22 at 06:12
Basically the red part is no longer bolded. I am guessing this is a edit to this line data.append(pd.read_excel(excel_file, engine="openpyxl")). Sorry familiarizing myself again w Python – Anirudh Gattu Jun 09 '22 at 06:18
Ah, I see, thanks for the clarification. Yes, all formatting gets lost when importing the data via pd.read_excel, since dataframes only contain data and no formatting information. If you want to preserve the formatting as well, I guess this method is not suitable then. – Christian Karcher Jun 09 '22 at 07:28

score -1 · Answer 2 · answered Jun 08 '22 at 22:39

-1

If you want to merge multiple Excel files into one single file, you can do it this way.

import pandas as pd
import numpy as np

import glob
glob.glob("C:\\*.xlsx")

all_data = pd.DataFrame()
for f in glob.glob("C:\\*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)
    print(all_data)
    df = pd.DataFrame(all_data)
    df.shape
    df.to_excel("C:\\final.xlsx", sheet_name='Sheet1')

answered Jun 08 '22 at 22:39

ASH

20,759
19
87
200

I have tried this way, but it seems that the formatting is having a lot of errors, I also mainly wanted it to be appended horizontally. The implementation above by Christian works fine but just needed to know how to tweak it such that there is a column gap when the next df is appended. – Anirudh Gattu Jun 09 '22 at 01:12
If it has to be horizontal, then use pd.concat(). – ASH Jun 09 '22 at 11:04
yep it seemed to work when pd.concat was used with axis = 1 tq – Anirudh Gattu Jun 09 '22 at 13:58

Combine excel files with pandas

2 Answers2