Using C#, From a list of strings, how do we find the one that most closely matches a test string?

Question

Given a list of strings:

strings[0] = "Mary Brown";
strings[1] = "Sally Green";
strings[2] = "Lucy Purple";

Given an input string:

x = "Mary Brown is a nice person";

How does one determine that the first string is the string that mostly matches x better than the rest?

In my situation, it's not known that the string will start with the answer. It could be mid word as well. It could say "Mark Brown is a nice person" instead of "Mary Brown is a nice person", yet "Mary Brown" would still be the closest match.

NOTE: The answer doesn't have to use Regex. I'm looking for a C# answer.

can you use implement a module like Lucene? It is designed to do rankings. — Doug Chamberlain, Sep 11 '14 at 03:40
similar question here: http://stackoverflow.com/questions/643538/fastest-way-to-find-most-similar-string-to-an-input — Setyo N, Sep 11 '14 at 03:51
The answer is ultimately depends on how you define "most closely match". Unless you do it, the question is nonsensical. — koryakinp, Sep 11 '14 at 04:21

score 2 · Answer 1 · edited May 23 '17 at 12:29

I would split the search text and the inputs by space and find the count of the matching word. Order by descending by the count then take the text.

var inputs = new[] { "Mary Brown", "Sally Green", "Lucy Purple" };
var searchText = "Mary Brown is a nice person";

var words = searchText.Split(' ');
var result = inputs.Select(text => new
    {
        MatchCount = text.Split(' ')
            .Sum(input => words.Where(word => word == input).Count()),
        Text = text
    })
    .OrderByDescending(a => a.MatchCount)
    .Select(a => a.Text)
    .DefaultIfEmpty()        
    .First();

Output:

Mary Brown

PS

To get the better result, the word == input part can be replaced with string similarity algorithm like in this post.

Using C#, From a list of strings, how do we find the one that most closely matches a test string?

1 Answers1