0

I have two excel files, master and child. All the column names are same in both the files.

I want to match column C of both the files, if there is a match then update all the columns for that specific row and if there is no match append the row at the end of the Master file.

I want to update data from child file to Master file based on this logic. So far I am able to update the Master file by copying all the data from Child to Master, but the data in Master file is getting replaced from Child file for that specified range. Any help will be much appreciated.

import openpyxl 

Master = openpyxl.load_workbook("Master.xlsx")
Child = openpyxl.load_workbook("Child.xlsx")

Master_File = Master["Sheet1"]
Child_File = Child["Sheet1"]

Function to copy rows and columns from Child File

def copyRange(startCol, startRow, endCol, endRow, sheet):
    rangeSelected = []
    #Loop through selected rows------------------------------------------------
    for i in range(startRow, endRow + 1, 1):
        #Appends to row selected list--------------------------------------------
        rowSelected = []
        for j in range(startCol, endCol + 1, 1):
            rowSelected.append(sheet.cell(row=i, column=j).value)
        #Adds the rowSelected list and nests inside rangesSelected list------------------------------
        rangeSelected.append(rowSelected) 
    return rangeSelected

Function to paste all the data to Master File

def pasteRange(startCol, startRow, endCol, endRow, sheetReceiving, copiedData):
    countRow = 0
    for i in range(startRow, endRow + 1, 1):
        countCol = 0
        for j in range(startCol, endCol + 1, 1):

            sheetReceiving.cell(row=i, column=j).value = copiedData[countRow][countCol]
            countCol += 1
        countRow += 1

Main function

def createData():
    wb = Workbook()
    print("Your data is being Processed.....")
    selectedRange = copyRange(1,10,39,45, Child_File)
    pastingRange = pasteRange(1,10,39,45, Master_File, selectedRange)
    Master.save(r"Final.xlsx")
    print("Range copied and pasted")





Final = createData()
  • Could you just use pandas and do a .update() between the two dataframes? – Chris Apr 02 '20 at 01:06
  • Even I was considering that. I got really confused of whether to use pandas initially by converting to dataframe, update the values and then use openpyxl to copy and paste the data. What do you thing, should I implement everything in pandas or should I use openpyxl and pandas combination? I heard that openpyxl and xlrd are really good when it comes to excel files, so thought of implementing in openpyxl. – StupendousEnzio Apr 02 '20 at 01:14
  • If you're not interested in cell formatting and such good ol pandas will do the trick – Chris Apr 02 '20 at 01:24

1 Answers1

0

This demonstrates how you can take two dataframes (which could both be created from .read_excel() in pandas), set your index column as the column you want to match on, and update the original with matches from the second dataframe, then write to xlsx again.

import pandas as pd
#df = pd.read_excel('myfile1.xlsx')
df = pd.DataFrame({'C': [1, 2, 3],
                   'D': [400, 500, 600]})
#new_df = pd.read_excel('myfile2.xlsx')
new_df = pd.DataFrame({'C': [1, 2, 6],
                       'D': [7, 8, 9]})

df.set_index('C', inplace=True)
df.update(new_df.set_index('C'))

df.update(new_df)

df.reset_index().to_excel('updated.xlsx', index=False)

Output

    C   D
0   1   8.0
1   2   9.0
2   3   600.0
Chris
  • 15,819
  • 3
  • 24
  • 37
  • Appreciate the code. I tried to implement and got this error '''ValueError: cannot reindex from a duplicate axis''' I checked and it was because there are similar columns in both the files. On stackoverflow, they suggest that, get rid of the duplicate columns. Here's the link - https://stackoverflow.com/questions/27236275/what-does-valueerror-cannot-reindex-from-a-duplicate-axis-mean – StupendousEnzio Apr 02 '20 at 02:39