4

Are there any algorithms that would find the closest match to a string from a collection of strings? For example:

string_to_match = 'What color is the sky?'

strings = [
  'What colour is the sea?', 
  'What colour is the sky?', 
  'What colour is grass?', 
  'What colour is earth?'
]

answer = method_using_string_matching_algorithm(string_to_match, strings)
answer # returns strings[1] 'What colour is the sky?'
Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
Neil
  • 189
  • 1
  • 1
  • 9

1 Answers1

5

The search terms you're looking for are "string distance algorithms" and "approximate string matching." A quick check of Google turns up interesting options such as:

  • Sift3 Distance
  • Levenshtein Distance
  • Optimal String Alignment Distance
  • Damerau-Levenshtein Distance
  • Qwerty Keyboard Distance

Some useful links include:

As of this writing, Debian-based Linux distributions also include agrep and TRE-agrep in their repositories.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
  • 1
    thanks for the keywords and links. I've been using the Ruby Amatch gem which includes Levenshtein http://flori.github.com/amatch/ – Neil May 31 '12 at 17:40