Python - Compare two files and write changed rows and new rows to a new file

Question

Using Python (Pandas being acceptable) I need to compare file2 to file1 and output to file3 the rows from file2 that are either:

Present in file1 but have some changed values
Not present in file1

file1

ID,Format,Output,FontSize,isDefault
1,LP,LETTER,14,False
2,LP,LETTER,16,False
3,LP,LETTER,18,True
4,LP,LETTER,20,False
5,LP,LETTER,22,False

file2

ID,Format,Output,FontSize,isDefault
2,LP,LETTER,16,False
3,AU,LETTER,18,False
4,BR,LETTER,20,False
5,LP,LETTER,22,False
6,LP,LEDGER,24,False
7,LP,LEDGER,36,False

file3

ID,Format,Output,FontSize,isDefault
3,AU,LETTER,18,False
4,BR,LETTER,20,False
6,LP,LEDGER,24,False
7,LP,LEDGER,36,False

In this example the ID column is unique.

please describe your logic of `compare`. if it is very general file comparing, then there are lot of existing libraries. — Lei Yang, Feb 16 '22 at 15:10

score 0 · Answer 1 · answered Feb 16 '22 at 15:22

Assuming that the files are CSVs, and that the IDs are simply the row numbers, you could get a csv.reader() to get lists of the rows from the two files. Then you loop over them to see if the same row exists in both files; if not, you add them to a new list:

new_list = []
number_of_rows = min(len(list_from_file1), len(list_from_file2))
for i in range(number_of_rows):
    if list_from_file1[i] != list_from_file2[i]:
        new_list.append(list_from_file2[i])

That will copy all rows that exist in both files but with different content into new_list. You can then also copy all remaining rows from the longer file into new_list too, and write new_list into file3 with csv.writer.

The IDs are not the row numbers. ID 2 from file2 needs to be compared to ID 2 from file1 and if there is a change it needs to be written to file3. If there is a ID present in file2 but not in file 1 then it also needs to be written to file3. — DSora, Feb 16 '22 at 16:23

score 0 · Answer 2 · answered Feb 16 '22 at 17:29

Assuming you are reading csv files, try this code

import pandas as pd
import numpy as np
import csv

#Instead of below 2 lines of code, read data from files using the link at the end of code, below.
df1 = pd.DataFrame([[1,'LP','LETTER',14,False],[2,'LP','LETTER',16,False]], columns =['ID','Format','Output','FontSize','isDefault'])
df2 = pd.DataFrame([[2,'LP','LETTER',16,False],[3,'AU','LETTER',18,False]], columns =['ID','Format','Output','FontSize','isDefault'])

result = []

for index, row_df2 in df2.iterrows():
    if row_df2['ID'] in df1.ID.to_list():
        row_df1 = df1[df1.ID == row_df2['ID']].iloc[0] 
        if np.array_equal(row_df2.values,row_df1.values):
            result.append(row_df2.to_list()   )
    else:
        result.append(row_df2.to_list()  )

#Write to CSV file
with open('file3.csv', 'w', newline='') as myfile:
     wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
     wr.writerow(['ID','Format','Output','FontSize','isDefault'])
     for val in result:
        wr.writerow(val)

Import CSV file as a pandas DataFrame

Python - Compare two files and write changed rows and new rows to a new file

2 Answers2