I am working in SAS and I have a data-set with 2 columns and I want not only to remove the duplicates, but also the "almost" duplicates. The data looks like this:
**Brand Product**
Coca Cola Coca Cola Light
Coca Cola Coca Cola Lgt
Coca Cola Cocacolalight
Coca Cola Coca Cola Vanila
Pepsi Pepsi Zero
Pepsi Pepsi Zro
i do not know if it is actually possible, but what I would like the file to look like after removing the "duplicates", is like that:
**Brand Product**
Coca Cola Coca Cola Light
Coca Cola Coca Cola Vanila
Pepsi Pepsi Zero
I don't have a preference if the final table will have e.g. "Pepsi Zero" or "Pepsi Zro" as long as there are no "duplicate" values.
I was thinking if there was a way to compare the e.g. first 4-5 letters and if they are the same then to consider them as duplicates. But of course I am open to suggestions. If there is a way to be done even in excel I would be interested to hear it.