import pandas as pd
file1 = 'Total.xlsx'
df1 = pd.read_excel(file1)
file2 = 'Recent.xlsx'
df2 = pd.read_excel(file2)
non_matching_rows = []
for index1, row1 in df1.iterrows():
row_matches = False
for index2, row2 in df2.iterrows():
if row1.equals(row2):
row_matches = True
break
if not row_matches:
non_matching_rows.append(row1)
non_matching_df = pd.DataFrame(non_matching_rows)
display(non_matching_df)
print(non_matching_df.count())
Total.xlsx contains almost 40k records and Recent.xlsx has almost 36k records. I needed to find the remaining 4k records that is unique in Total.xlsx. I tried the above code, but it is not working for the entire excel files. When I tried to reduce the records in both the files, it can process and produce accurate results [only to 550 records]. Any file more than 500 records it is not working (I tried chunk size also, didn't win). Any suggestions??