I have two CSV files: CSV1 CSV2
CSV2 has 32 rows, CSV1 has 25 rows, so I need to know the 7 exact rows that are in CSV2, but not CSV1. Currently, my set() code is removing the duplicates in both CSVs which I need to not happen. Additionally, because the additional rows found in CSV2 cause the order of the rows to not match the 25 rows in CSV1, it automatically outputs all additional rows following the first missing row as missing. Here is my current logic:
import csv
import pandas as pd
import numpy as np
data1 = pd.read_csv('CSV2.csv')
data2 = pd.read_csv('CSV1.csv')
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df = df2[~df2.eq(df1).all(axis=1)]
print(df)
Here is what I want the outcome to be:
I need to ensure duplicates are not removed from either file so I can see all 7 rows present in CSV2 that are not in CSV1. Thank you for your help.