I am working with a database with approximately 100k entries and want to find all similar names in this database that I put within one column. I am now using soundex
but the results are way to fuzzy and filtering those fuzzy results in my php makes the process with so many soundex
classes and entries in the database very slow so I hope there is another way to filter out better matches than soundex does.
My Query:
SELECT soundex(full_name) AS soundex,
full_name AS customer_name
FROM (SELECT CONCAT(cu.first_name,' ', cu.last_name) AS full_name
FROM `customers` AS cu
WHERE cu.`status` = 1) a
ORDER BY soundex(full_name))
So I compare all the names that I put into one column and show them all ordered by soundex
.
Is there a way to user DIFFERENCE(soundex, soundex)
in a perfomatively good way besides cross joining the whole table and compare each and every name with each other? Or is there a good way to sufficiently sort out not very similar names?