-1

I have few sentences inside my new List(file.txt). Example:

  • Walt Disney refused to allow Alfred Hitchcock to film at Disneyland in the early 1960s because he had made “that disgusting movie Psycho.
  • Pumbaa in The Lion King was the first character to fart in a Disney movie.
  • Walt Disney paid the animators on Snow White and the Seven Dwarfs $5 for any gag that made it into the final version of the movie.
  • The cake in the movie Sixteen Candles is made of cardboard.

etc.

I want to display in my listBox only these sentences which contains any of words entered in search box.Example: When i enter "disney seven dwarfs", it should display "Walt Disney paid the animators on Snow White and the Seven Dwarfs $5 for any gag that made it into the final version of the movie." on the top of the list. It shouldn't display "The cake in the movie Sixteen Candles is made of cardboard.", because this sentence does not contain any of entered words. In brief: on the top should be displayed the result with the highest number of matching words.

public static IEnumerable<string> SplitSearchWords(string str)
{
 int charIndex = 0;
 int wordStart = 0;
 while (charIndex < str.Length)
 {
    wordStart = charIndex;
    if (char.IsLetterOrDigit(str[charIndex]))
    {
        while (charIndex < str.Length && char.IsLetterOrDigit(str[charIndex])) charIndex++;
        yield return str.Substring(wordStart, charIndex-wordStart);
    }
    else
    {
        while (charIndex < str.Length && !char.IsLetterOrDigit(str[charIndex])) charIndex++;
    }
  }
}

public static int CalculateSearchRelevance(string searchItem, IEnumerable<string> searchWords)
{
  var searchItemWords = SplitSearchWords(searchItem);
  return searchWords.Intersect(searchItemWords, StringComparer.OrdinalIgnoreCase).Count();
}

var myFile = File.ReadAllLines("file.txt");
var myList = new List<string>(myFile);

var query = textBox1.Text;
var items = myList;

var searchWords = SplitSearchWords(query).Distinct(StringComparer.OrdinalIgnoreCase).ToList();
var sortedItems = items.OrderByDescending(s => CalculateSearchRelevance(s, searchWords)).ToList();
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291

3 Answers3

0

you can use String.Contains() Method and build a custom function to determine the match percentage

0

You need to check if there are any words matching before sorting:

var searchWords = query.Split(null).Distinct(StringComparer.OrdinalIgnoreCase).ToList();

var matchingItems = items.Where(s => CalculateSearchRelevance(s, searchWords) > 0);
var sortedItems = matchingItems.OrderByDescending(s => CalculateSearchRelevance(s, searchWords)).ToList();

Because you don't want to show "The cake in the movie Sixteen Candles is made of cardboard."

To show the highest number of matches you need some kind of state, for example using Dictionary and saving all the matches there for further processing or use some class to store these things.

Or

Recalculate number of matches on the first item of your sortedItems

Or

use LINQ's Select and create anonymous type like ZiggZagg's answer which is even more elegant ;)

EDIT: Solution for your problem in the comment

Intersect takes IEqualityComparer as one of it's arguments. The default implementation of IEqualityComparer for strings uses Equals so one solution would be to write your own implementation of IEqualityComparer that uses Contains and based on that decide if it's equal or not.

class MyComparer : IEqualityComparer<string>
{
    public bool Equals(string x, string y)
    {
        return x.IndexOf(y, StringComparison.OrdinalIgnoreCase) >= 0;
    }

    public int GetHashCode(string obj)
    {
        return 0;
    }
}

public static int CalculateSearchRelevance(string searchItem, IEnumerable<string> searchWords)
{
    var searchItemWords = searchItem.Split(null).ToList();
    return searchWords.Intersect(searchItemWords, new MyComparer()).Count();
}

Other way is to rewrite CalculateSearchRelevance like this:

public static int CalculateSearchRelevance(string searchItem, IEnumerable<string> searchWords)
{
    var searchItemWords = searchItem.Split(null);
    return searchItemWords.Where(w => searchWords.Any(searchWord => w.IndexOf(searchWord, StringComparison.OrdinalIgnoreCase) >= 0)).Count();
}

With the implementation above, "disne" or "disney" will match both "Disney" and "Disneyland". I used IndexOf instead of Contains to perform case-insensitive operation.

Please note that if you want more advanced search-engine like possibilities you probably want to take a look at Lucene or Elasticsearch that is built on top of Lucene. You get all the features of search-engine out of the box :) And many giants use it.

https://github.com/apache/lucenenet

https://github.com/elastic/elasticsearch-net

Konrad
  • 6,385
  • 12
  • 53
  • 96
0

The problem is that you are always including all results even if they are not relevant. You could filter item to include at least one match only by checking if the relevance is > 0.

var sortedItems = items
    .Select(s => new {Text = s, Relevance = CalculateSearchRelevance(s, searchWords)})
    .Where(textWithRelevance => textWithRelevance.Relevance > 0)
    .OrderByDescending(textWithRelevance => textWithRelevance.Relevance)
    .ToList();

foreach (var sortedTextWithRelevance in sortedItems)
{
    Console.WriteLine($"Relevance: {sortedTextWithRelevance.Relevance}, Text: {sortedTextWithRelevance.Text} ");
}
ZiggZagg
  • 1,397
  • 11
  • 16