0

I have to compare two large data tables with information of customers, only thing is compatible that is addresses of customers. But people makes typos. Is there a function where I can compare string with compability procent? Like if table1.address 90% equals table2.address then it will return true.

Ulrich Thomas Gabor
  • 6,584
  • 4
  • 27
  • 41
hln
  • 1,071
  • 4
  • 21
  • 37
  • You may use such thing as levenshtein distance. Brief overview & code available in [this answer](http://stackoverflow.com/a/6392380/2637490). But the fact, that your data set is large - should at least make you thin about using similarity again. If it's only for one query, only once - it's ok. But if there's an intention to make it part of some functionality - then it's a design flaw. You should instead create solid relation between those two tables – Alma Do May 28 '14 at 08:58

1 Answers1

0

The function you are looking for is a string distance calculation. As I know there is no built-in functions to do this in MySQL (fix me).

A widely used algorithm is the Levenshtein distance. This algorithm calculates the distance (the count of alterations have to be made to make 2 strings the same). You can find the pseudo code on Wikipedia

Here is an implementation in MySQL: http://openquery.com.au/blog/levenshtein-mysql-stored-function

If you divide the distance with the length of the longest string, you'll get the Levenshtein distance ratio. This is what you are looking for.

After you create the functions in your database, you can use it as a WHERE condition or as the part of the ON clause in JOINs.

Here is a link to an implementation of the Levenshtein distance ratio http://falseisnotnull.wordpress.com/2013/05/18/levenshtein-and-levenshtein_ratio-functions-for-mysql/

Pred
  • 8,789
  • 3
  • 26
  • 46