5

I have a MySQL InnoDB table with a 'name' column (VARCHAR(255)) which I want users to be able to search against, returning all the matching rows. However, I can't just use a LIKE query because the search needs to allow for users typing in names which are similar to the available names (e.g. prefixing with 'The', or not knowing that the correct name includes an apostrophe).

Two examples are:

Name in DB: 'Rose and Crown'

Example possible searches which should match: 'Rose & Crown', 'Rose and Crown', 'rose and crown', 'The Rose and Crown'

Name in DB: 'Diver's Inn'

Example possible searches which should match: 'Divers' Inn', 'The Diver's Inn', 'Divers Inn'

I also want to be able to rank the results by a 'closest match' relevance, although I'm not sure how that would be done (edit distance perhaps?).

It's unlikely that the table will ever grow beyond a few thousand rows, so a method which doesn't scale to millions of rows is fine. Once entered, the name value for a given row will not change, so if an expensive indexing operation is required that's not a problem.

Is there an existing tool which will perform this task? I've looked at Zend_Search_Lucence but that seems to focus on documents, whereas I'm only interesting in searching a single column.

Edit: On SOUNDEX searching, this doesn't produce the results I want. For example:

SELECT soundex( 'the rose & crown' ) AS soundex1, soundex( 'rose and crown' ) AS soundex2;
soundex1    soundex2
T6265   R253265

Solution: In the end I've used Zend_Search_Lucence and just pretended that every name is in fact a document, which seems to achieve the result I want. I guess it's full text search in a way, even though each string is at most 3-4 words.

pwaring
  • 3,032
  • 8
  • 30
  • 46

2 Answers2

4

Full Text Search (FTS) is the terminology for the database functionality you desire. There's:

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
  • Native MySQL support won't work - as I said in the question my tables are InnoDB. Also, the user won't specify their query as 'Rose', 'Crown', it will be 'Rose & Crown' (for example). – pwaring May 29 '11 at 16:46
  • @pwaring: That's why I mentioned 3rd party support. Knowing the common terminology should make finding more information easier. – OMG Ponies May 29 '11 at 16:48
1

Here is a SO question that comes very close to what you want. While the answer is for PHP and MySQL, the general principle still applies:

How do I do a fuzzy match of company names in MYSQL with PHP for auto-complete?

Basically you would use SOUNDEX to get you what you want. If you need more power, longer strings, etc. you might want to look into Double Metaphone, which is an improvement over Metaphone and SOUNDEX:

http://aspell.net/metaphone/

http://www.atomodo.com/code/double-metaphone

Community
  • 1
  • 1
IAmTimCorey
  • 16,412
  • 5
  • 39
  • 75
  • 1
    The drawbacks of SOUNDEX seem a bit too great for me - especially the first letter being the same ('The Rose and Crown' and 'Rose & Crown' don't have the same first letter). – pwaring May 29 '11 at 16:45
  • @pwaring: You may bypass that by first stripping your strings from small common words like `a`, `and`, `the`, and also apostrophes, quotes, commas, etc. And then use Soundex. – ypercubeᵀᴹ May 29 '11 at 16:55
  • I could do, but that requires writing code to strip common words, punctuation etc. whereas I really want to say "here's the user's query, search against this column and return the results ordered by relevance". If I have to strip things from the query you can guarantee I'll miss something. :) – pwaring May 29 '11 at 16:59
  • which happens automatically (and is configurable, to a degree) when using Full Text Search (FTS) functionality. – OMG Ponies May 29 '11 at 17:00