6

I have two strings (they're going to be descriptions in a simple database eventually), let's say they're

  1. String A: "Apple orange coconut lime jimmy buffet"
  2. String B: "Car bicycle skateboard"

What I'm looking for is this. I want a function that will have the input "cocnut", and have the output be "String A"

We could have differences in capitalization, and the spelling won't always be spot on. The goal is a 'quick and dirty' search if you will.

Are there any .net (or third party), or recommend 'likeness algorithms' for strings, so I could check that the input has a 'pretty close fragment' and return it? My database is going to have liek 50 entries, tops.

Steve
  • 213,761
  • 22
  • 232
  • 286
greggorob64
  • 2,487
  • 2
  • 27
  • 55

1 Answers1

12

What you’re searching for is known as the edit distance between two strings. There exist plenty of implementations – here’s one from Stack Overflow itself.

Since you’re searching for only part of a string what you want is a locally optimal match rather than a global match as computed by this method.

This is known as the local alignment problem and once again it’s easily solvable by an almost identical algorithm – the only thing that changes is the initialisation (we don’t penalise whatever comes before the search string) and the selection of the optimum value (we don’t penalise whatever comes after the search string).

Community
  • 1
  • 1
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • I think I figured out my solution, I'm going to use the levenshtien algorithm. Since most of my data is simple and space seperated, i'll just compare my string to a space-seperated version of the database entries, and take the highest words as the result. – greggorob64 Mar 08 '13 at 21:33