I'm looking to display all possible duplicate entries from my MySQL table from column name
.
For example, Adobe Photoshop and Photoshop should be shown as 'possible duplicates.'
Is this possible?
I'm looking to display all possible duplicate entries from my MySQL table from column name
.
For example, Adobe Photoshop and Photoshop should be shown as 'possible duplicates.'
Is this possible?
You can start with using SOUNDEX()
, this will probably do for what you need
The drawbacks of SOUNDEX()
are:
Example:
SELECT SOUNDEX('Microsoft')
SELECT SOUNDEX('Microsift')
SELECT SOUNDEX('Microsift Corporation')
SELECT SOUNDEX('Microsift Subsidary')
/* all of these return 'M262' */
For more advanced needs, I think you need to look at the Levenshtein distance (also called "edit distance") of two strings and work with a threshold. This is the more complex (=slower) solution, but it allows for greater flexibility.
Main drawback is, that you need both strings to calculate the distance between them. With SOUNDEX you can store a pre-calculated SOUNDEX in your table and compare/sort/group/filter on that. With the Levenshtein distance, you might find that the difference between "Microsoft" and "Nzcrosoft" is only 2, but it will take a lot more time to come to that result.
In any case, an example Levenshtein distance function for MySQL can be found at Levenshtein Distance as a MySQL Stored Function .