Is there way easier way to capture the difference in data between two dataframe. I am trying to compare 2 dataframe and find the missing. I found this attached link but it's not satisfying my requirement. In that post they comparing using Id but I wanted to compare the entire rows of 2 dataframes and find the mismatch row whether it is from A or B.
Iterate over different dataframe
DF 1:
country from to flag
('GB', 'GB390065', 'GB66903', 'Y')
('FR', 'FR418458', 'FR765617', 'Y')
('FR', 'FR629810', 'FR855277', 'Y')
('FR', 'FR494010', 'FR332891', 'Y')
('FR', 'FR740500', 'FR907878', 'Y')
('FI', 'NB444135', 'NB234471', 'Y')
DF 2:
country from to flag
('GB', 'GB390065', 'GB66903', 'Y')
('FR', 'FR418458', 'FR7656', 'Y')
('FR', 'FR629810', 'FR855277', 'Y')
('FR', 'FR4910', 'FR33891', 'Y')
('FR', 'FR740500', 'FR907878', 'Y')
('FI', 'NB444135', 'NB234471', 'Y')
My expected output:
country from to flag Available
('FR', 'FR418458', 'FR7656', 'Y' df2)
('FR', 'FR4910', 'FR33891', 'Y' df2)
The challenge I face here is from column
and to column
are same data only so when I tried using merge function
, it is taking any one of the column and giving the result which is not meeting the requirement.
The output I got is
country_code from_cust_id ... to_cust_guid Merge_status
322443 DE DE180556 ... NaN left_only
322444 DE DE22191 ... NaN left_only
322445 DE DE2625168 ... NaN left_only
322446 DE DE17705 ... NaN left_only
322447 DE DE2556758 ... NaN left_only
[5 rows x 7 columns]
country_code from_cust_id ... to_cust_guid Merge_status
0 DE NaN ... DE485137 right_only
1 DE NaN ... DE467209 right_only
2 DE NaN ... DE651068 right_only
3 DE NaN ... DE459729 right_only
4 DE NaN ... DE448683 right_only