1

Given a list of strings:

strings[0] = "Mary Brown";
strings[1] = "Sally Green";
strings[2] = "Lucy Purple";

Given an input string:

x = "Mary Brown is a nice person";

How does one determine that the first string is the string that mostly matches x better than the rest?

In my situation, it's not known that the string will start with the answer. It could be mid word as well. It could say "Mark Brown is a nice person" instead of "Mary Brown is a nice person", yet "Mary Brown" would still be the closest match.

NOTE: The answer doesn't have to use Regex. I'm looking for a C# answer.

101010
  • 14,866
  • 30
  • 95
  • 172

1 Answers1

2

I would split the search text and the inputs by space and find the count of the matching word. Order by descending by the count then take the text.

var inputs = new[] { "Mary Brown", "Sally Green", "Lucy Purple" };
var searchText = "Mary Brown is a nice person";

var words = searchText.Split(' ');
var result = inputs.Select(text => new
    {
        MatchCount = text.Split(' ')
            .Sum(input => words.Where(word => word == input).Count()),
        Text = text
    })
    .OrderByDescending(a => a.MatchCount)
    .Select(a => a.Text)
    .DefaultIfEmpty()        
    .First();

Output:

Mary Brown

PS

To get the better result, the word == input part can be replaced with string similarity algorithm like in this post.

Community
  • 1
  • 1
Yuliam Chandra
  • 14,494
  • 12
  • 52
  • 67