1

So the over all on this is I have two different systems and in both systems I have customers, unfortunately both systems allow you to type in the business name freehand so you end up with the example below.

Column A has a value of "St John Baptist Church"
Column B has a value of "John Baptist St Church"

What I need to come up with is a query that can compare the two columns to find the most closely matched values. From there I plan to write a web app where I can have someone go through and validate all of the entries. I would enter in some example of what I have done, but unfortunately I honestly dont even know if what I am asking for is even possible. I would think it is though in this day and age I am sure I am not the first one to try to attempt this.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
scripter78
  • 1,117
  • 3
  • 22
  • 50

1 Answers1

1

You could try and create a script something like this php script to help you:

$words = array();
$duplicates = array();

function _compare($value, $key, $array) {
    global $duplicates;

    $diff = array_diff($array, $value);

    if (!empty($diff)) {
        $duplicates[$key] = array_keys($diff);
    }

    return $diff;
}

$mysqli = new mysqli('localhost', 'username', 'password', 'database');
$query = "SELECT id, business_name FROM table";

if ($result = $mysqli->query($query)) {
    while ($row = $result->fetch_object()) {
        $pattern = '#[^\w\s]+#i';
        $row->business_name = preg_replace($pattern, '', $row->business_name);
        $_words = explode(' ', $row->business_name);
        $diff = array_walk($words, '_compare', $_words);    
        $words[$row->id][] = $_words;

        $result->close();
    }
}

$mysqli->close();

This is not tested but you need something like this, because I don't think this is possible with SQL alone.

---------- EDIT ----------

Or you could do a research on what the guys in the comment recommend Levenshtein distance in T-SQL

Hope it helps, good luck!

Community
  • 1
  • 1
Azrael
  • 193
  • 8