-3

How to combine the columns from every sheet, using pandas?

I need to iterate through each sheet in one Excel file, and merge every sheet separately, like the following images.

I have around 1000 sheets in one file, and sheets names are not same.

How to do the iteration and merging?

show sheet1

show sheet2

show sheet3

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158

4 Answers4

1
  • It seems easiest to use pandas.read_excel with sheet_name=None.
    • Specify None to get all sheets.
    • This will create a dict of dataframes with the sheet_names as the keys.
    • Iterate through the keys and values, to combine the two columns and write back to a file.
  • As per your comment, the .fillna() method may be used on either column, as shown in the following code.

Write just the new combined column back to the sheet

import pandas as pd

# create dict of dataframes
df = pd.read_excel('test.xlsx', sheet_name=None)

# iterate through dict
for k, v in df.items():
    with pd.ExcelWriter('combined.xlsx', mode='a', engine='openpyxl') as writer:  # write sheets to a new file
        combined = v.iloc[:, 0].fillna('xxx') + ' ' + v.iloc[:, 1].fillna('xxx')  # combine the first and second column
        combined.to_excel(writer, sheet_name=k, index=False, header=False)

Write the original, and the combined columns back to the sheet

df = pd.read_excel('test.xlsx', sheet_name=None)

for k, v in df.items():
    with pd.ExcelWriter('combined.xlsx', mode='a', engine='openpyxl') as writer:  # write sheets to a new file
        v['comb'] = v.iloc[:, 0].fillna('xxx') + ' ' + v.iloc[:, 1].fillna('xxx')  # combine the first and second column
        v.to_excel(writer, sheet_name=k, index=False, header=False)
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
0

This is explained in many posts(Using Pandas to pd.read_excel() for multiple worksheets of the same workbook)

xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

To merge dataframes by columns,

df1.append(df2)
df1.append(df3)
...

The above code will result a merged dataframe saved in df1.

krenerd
  • 741
  • 4
  • 22
0

try to read the 3 excel sheets separately as shown below and then use pd.merge to merge df1, df2, df3 as a single data frame by index or any other columns to match the right columns of all the three data frame

            import pandas as pd
            df1 = pd.read_excel(f_name.xls, 'Sheet1')
            df2 = pd.read_excel(f_name.xls, 'Sheet2')
            df3 = pd.read_excel(f_name.xls, 'Sheet3')
            df = pd.merge(df1, (pd.merge(df2, df3, how='outer', left_index=True, right_index=True), how='outer', left_index=True, right_index=True)
hhp
  • 1
  • @hhp, welcome to StackOverflow and thanks for an answer. However, please don't use the header (#) symbols just to make the text stand out more, it won't help. – tornikeo Sep 07 '20 at 03:45
0

Update:

import pandas as pd

writer = pd.ExcelFile('Downloads\wbook.xlsx')

writer.sheet_names
#prints names of the spreadsheets.

excelfile = pd.ExcelWriter('workb1.xlsx', engine='openpyxl')
for i in writer.sheet_names:
    df = pd.read_excel(writer, sheet_name=i, header=None)
    df[3] = df.iloc[:, 0] + ' ' + df.iloc[:, 1]
    df.to_excel(excelfile, sheet_name=i, index=False)
excelfile.save()

Creates a new spreadsheet with the combined columns A and B for each sheet in the original workbook.

Scott Boston
  • 147,308
  • 15
  • 139
  • 187