0

I'm currently looking for a way to realize a partial word pattern algorithm in C#. The situation I'm in looks like follows:

I got a textfield for the search pattern. Every time the user enters or deletes a char in this field, an event triggers which re-runs the search algorithm. So in case I want to search for the word "face" in strings like

"Facebook", "Facelifting", ""Faceless Face" (whatever that should be) or in generally ANY real life sentences as strings,

the algorithm would first start running when typing "f" in the field. It then show the most relevant String on top of a list the strings are in. The second time it runs when "fa" is typed, and the list is sorted again. This goes on until "face" is completely typed in the textfield and the list is sorted again.

However I don't know what algorithm could be used. I tried the answer from Alain (Getting the closest string match), a simple Levenshtein-Distance algorithm as well as an self-made algorithm, which calculates the priority via

priority = (length_of_typed_pattern) * (amount_of_substr_matches)

In C#, the latter looks like this:

count = Regex.Matches(Regex.Escape(title), pattern).Count;
priority = pattern.Length * count;

The pattern as well as the title are composed of only lowercase letters. My conclusions so far:

  • Hamming distance won't make any sense since the strings are not the same length most of the time
  • The answer from Alain works fine, but only if at least one word completely matches (you only find a most relevant string/sentence when at least one word is equal with the pattern, so if you have "face" typed and there's a string containing the word "facebook", the string containing "facebook" is almost never a top priority

What other ideas could I try? The goal would be to sort the list of strings the best possible way in the earliest moment (with the fewest letters).

You can look at my implementations in the search-* branches of my repository on http://github.com/croemheld/sprung) in Sprung/WindowMatcher.cs and Sprung/Window.cs.

Thanks for your help.

CRoemheld
  • 889
  • 7
  • 26

1 Answers1

0

First of all you need to store frequency related to a string(number of times a particular string is searched) in some place to show most relevant one when searched. If you need to show say k most relevant entries so a Min Heap of size 'k' can be implemented.

Case 1- If a letter is pressed for the first time:-

Step (a) Read all the string starting from a Data base or dictionary and store in some data structure(Say DS1) with a FLAG_VALID(set to 1 initially) which shows that it is valid string for the present search characters(for first letter all the strings will be valid). As you read strings fill the Min Heap according to their Frequency and an element with certain frequency is inserted only when its frequency is greater than minimum one(i.e. the first element of min Heap).

Step (b) (This step is same for all case to show result) To show results you need to show elements in reverse order than Min Heap i.e. first element in Min Heap will have least priority, so basically we need to delete all elements one by one and show it from last to first. NOTE:- Min Heap will contain reference to a particular string and so the string and its frequency can be accessed at the same time.

Case 2- Inserting next letters in search box:

Step (a) Search through DS1 in which all strings are present and check FLAG_VALID first. If it is a valid string than compare the string from search box and the string from DS1. Set the flag accordingly(if it is a match-1 or not-0) and fill k-Min Heap as it is empty from last search as in Case 1.

Step (b) is as usual.

Case 3- Deleting a letter in search box:

It is similar to above cases but this time we will need to search for those strings also whose FALG_VALID is 0(i.e string which are invalid).

This is a crude searching method and can be improved using certain Data structure and tweaking the algorithm.