0

I have a list of company names that I need to locate in the Companies table in our database.

The list of company names that I have are NOT precise. They could be misspelled. Some contain special characters, etc.

I'm trying to create a "fuzzy search" SELECT statement that will find the most likely matches in the Companies table. I'm doing some "sanitization" before sending names that need to be matched by trimming spaces in the beginning or end of words, removing words like "The", etc.

I tried the FREETEXT feature in SQL Server but the results were horrible. It pretty much returned all the rows in the Companies table.

For exmaple, I did a search for 1-800-Flowers but got all kinds of unrelated hits. What approach can I use to find the most relevant matches?

Sam
  • 26,817
  • 58
  • 206
  • 383
  • There is no correct answer to this. There's not even a "best" answer to it. – Tab Alleman Jun 11 '18 at 17:06
  • you could try [Fuzzy Lookup Transformation][1] in SSIS or [Excel fuzzyu matching][2] [1]: https://learn.microsoft.com/en-us/sql/integration-services/data-flow/transformations/fuzzy-lookup-transformation?view=sql-server-2017 [2]: https://www.microsoft.com/en-ca/download/details.aspx?id=15011 – user1443098 Jun 11 '18 at 17:12
  • Have you used LIKE eg: LIKE '%1%800%Flowers%'? – Shankar Duvvuri Jun 11 '18 at 17:13
  • @ShankarDuvvuri `LIKE` is a very simple approach and fails easily. For example, if the order of words don't match, `LIKE` fails. Say, I'm searching for "Company, The" but in my database it's "The Company", then `LIKE` fails to return it. – Sam Jun 11 '18 at 17:24
  • @user1443098 Fuzzy Lookup Transformation seems interesting but never used it before. I'm looking into it so thank you for the suggestion. – Sam Jun 11 '18 at 17:25

1 Answers1

0

You may be able to implement something using the Levenshtein distance.

I found an implementation of the function in SQL Server here.

Hope this helps.

Michael Ziluck
  • 599
  • 6
  • 19