Pandas find duplicates with reversed values between columns with other attributes

Question

I have been trying to identify and eliminate duplicates from a CSV input file.

Consider the below input file... In the input file, we need to find out if there are any "Fruit-Vegetable" pairs that when reversed are the same...

If they are the same (including the other attributes - Country, City and Postcode) then the first occurrence needs to be considered and the other duplicate can be eliminated.

If the other attributes do not match, a new column is added and the "Duplicate" label is added ...

Trying through multiple things I haven't been able to do the first step which is to identify the duplicates ...

If someone can help me with that, I should be able to proceed with the rest.. Thanks!

Input:

Fruit	Vegetable	Country	City	Postcode
Apple	Potato	Australia	Sydney	2000
Potato	Apple	Australia	Sydney	2000
Orange	Onion	Australia	Melbourne	3000
Grapes	Beans	Australia	Perth	6000
Beans	Grapes	Australia	Sydney	2000

Output:

Fruit	Vegetable	Country	City	Postcode	Duplicate
Apple	Potato	Australia	Sydney	2000	NA
Orange	Onion	Australia	Melbourne	3000	NA
Grapes	Beans	Australia	Perth	6000	Duplicate1
Beans	Grapes	Australia	Sydney	2000	Duplicate1

I've tried to reverse the string and try to merge them to find the duplicates, but they are not getting eliminated. Tried various other similar answers on stackoverflow but not able to find the right logic to get this through.

Sort the two column values (for each row) then perform a classical drop_duplicates — mozway, Jul 28 '23 at 04:46

Pandas find duplicates with reversed values between columns with other attributes

0 Answers0