4

I have been searching over on how to append/insert/concat a row from one excel to another but with merged cells. I was not able to find what I am looking for.

What I need to get is this: enter image description here

and append to the very first row of this: enter image description here

I tried using pandas append() but it destroyed the arrangement of columns.

df = pd.DataFrame()

for f in ['merge1.xlsx', 'test1.xlsx']:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

df.to_excel('test3.xlsx')

Is there way pandas could do it? I just need to literally insert the header to the top row.

Although I am still trying to find a way, it would actually be fine to me if this question had a duplicate as long as I can find answers or advice.

Ricky Aguilar
  • 329
  • 1
  • 8
  • 19
  • You could try this - https://stackoverflow.com/questions/25418620/adding-an-extra-column-to-the-combined-merged-excel-files?lq=1 – user2510479 May 06 '18 at 20:58

1 Answers1

7

You can use pd.read_excel to read in the workbook with the data you want, in your case that is 'test1.xlsx'. You could then utilize openpyxl.load_workbook() to open an existing workbook with the header, in your case that is 'merge1.xlsx'. Finally you could save the new workbbok by a new name ('test3.xlsx') without changing the two existing workbooks.

Below I've provided a fully reproducible example of how you can do this. To make this example fully reproducible, I create 'merge1.xlsx' and 'test1.xlsx'.

Please note that if in your 'merge1.xlsx', if you only have the header that you want and nothing else in the file, you can make use of the two lines I've left commented out below. This would just append your data from 'test1.xlsx' to the header in 'merge1.xlsx'. If this is the case then you can get rid of the two for llops at the end. Otherwise as in my example it's a bit more complicated.

In creating 'test3.xlsx', we loop through each row and we determine how many columns there are using len(df3.columns). In my example this is equal to two but this code would also work for a greater number of columns.

import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

df1 = pd.DataFrame()
writer = pd.ExcelWriter('merge1.xlsx') #xlsxwriter engine
df1.to_excel(writer, sheet_name='Sheet1')
ws = writer.sheets['Sheet1']
ws.merge_range('A1:C1', 'This is a merged cell')
ws.write('A3', 'some string I might not want in other workbooks')
writer.save()

df2 = pd.DataFrame({'col_1': [1,2,3,4,5,6], 'col_2': ['A','B','C','D','E','F']})
writer = pd.ExcelWriter('test1.xlsx')
df2.to_excel(writer, sheet_name='Sheet1')
writer.save()

df3 = pd.read_excel('test1.xlsx')
wb = load_workbook('merge1.xlsx')
ws = wb['Sheet1']
#for row in dataframe_to_rows(df3):
#    ws.append(row)
column = 2
for item in list(df3.columns.values):
    ws.cell(2, column=column).value = str(item)
    column = column + 1
for row_index, row in df3.iterrows():
    ws.cell(row=row_index+3, column=1).value = row_index #comment out to remove index
    for i in range(0, len(df3.columns)):
        ws.cell(row=row_index+3, column=i+2).value = row[i]

wb.save("test3.xlsx")

Expected Output of the 3 Workbooks:

Expected Output

patrickjlong1
  • 3,683
  • 1
  • 18
  • 32
  • Thank you for your answer. I was able to use your codes with some changes. Mind if I ask is there a parameter for .save() to not include/remove index? – Ricky Aguilar May 08 '18 at 09:46
  • 1
    I was able to remove the index by commeting `# ws.cell(row=row_index+3, column=1).value = int(row_index)` and setting the column of this line to 1 `ws.cell(row=row_index+3, column=1).value = int(row['col_1'])` – Ricky Aguilar May 08 '18 at 11:27
  • Great, you’re correct. Commenting out that line will remove the index. – patrickjlong1 May 08 '18 at 13:37
  • 1
    Thanks! Last question, for the last loop, is there a way to iterate it with all the existing columns? I have a 72 defined columns, this would not make my code look good if I write each one of them. Each column have a unique name. I am thinking of doing a for loop for the column but I am having a problem on how to iterate the unique column names. – Ricky Aguilar May 08 '18 at 13:39
  • Good Question. I'm going to update the answer to reflect my response to this. Essentially we could determine how many columns there are using `len(df3.columns)` . and then loop through each column within each row. – patrickjlong1 May 08 '18 at 15:10
  • 1
    I've updated the answer to reflect your question. Outputs remain the same in my example. – patrickjlong1 May 08 '18 at 15:18