0

I have been trying to identify and eliminate duplicates from a CSV input file.

Consider the below input file... In the input file, we need to find out if there are any "Fruit-Vegetable" pairs that when reversed are the same...

If they are the same (including the other attributes - Country, City and Postcode) then the first occurrence needs to be considered and the other duplicate can be eliminated.

If the other attributes do not match, a new column is added and the "Duplicate" label is added ...

Trying through multiple things I haven't been able to do the first step which is to identify the duplicates ...

If someone can help me with that, I should be able to proceed with the rest.. Thanks!

Input:

Fruit Vegetable Country City Postcode
Apple Potato Australia Sydney 2000
Potato Apple Australia Sydney 2000
Orange Onion Australia Melbourne 3000
Grapes Beans Australia Perth 6000
Beans Grapes Australia Sydney 2000

Output:

Fruit Vegetable Country City Postcode Duplicate
Apple Potato Australia Sydney 2000 NA
Orange Onion Australia Melbourne 3000 NA
Grapes Beans Australia Perth 6000 Duplicate1
Beans Grapes Australia Sydney 2000 Duplicate1

I've tried to reverse the string and try to merge them to find the duplicates, but they are not getting eliminated. Tried various other similar answers on stackoverflow but not able to find the right logic to get this through.

0 Answers0