I have to compare two large data tables with information of customers, only thing is compatible that is addresses of customers. But people makes typos. Is there a function where I can compare string with compability procent? Like if table1.address 90% equals table2.address then it will return true.
-
You may use such thing as levenshtein distance. Brief overview & code available in [this answer](http://stackoverflow.com/a/6392380/2637490). But the fact, that your data set is large - should at least make you thin about using similarity again. If it's only for one query, only once - it's ok. But if there's an intention to make it part of some functionality - then it's a design flaw. You should instead create solid relation between those two tables – Alma Do May 28 '14 at 08:58
1 Answers
The function you are looking for is a string distance calculation. As I know there is no built-in functions to do this in MySQL (fix me).
A widely used algorithm is the Levenshtein distance
. This algorithm calculates the distance
(the count of alterations have to be made to make 2 strings the same).
You can find the pseudo code on Wikipedia
Here is an implementation in MySQL: http://openquery.com.au/blog/levenshtein-mysql-stored-function
If you divide the distance
with the length of the longest string
, you'll get the Levenshtein distance ratio
. This is what you are looking for.
After you create the functions in your database, you can use it as a WHERE
condition or as the part of the ON
clause in JOIN
s.
Here is a link to an implementation of the Levenshtein distance ratio
http://falseisnotnull.wordpress.com/2013/05/18/levenshtein-and-levenshtein_ratio-functions-for-mysql/

- 8,789
- 3
- 26
- 46