Given a string in the format {Length}.{Text}
(such as 3.foo
), I want to determine which string, from a finite list, the given string is.
The reader starts at the 0-index and can seek forward (skipping characters if desired).
As an example, consider the following list:
10.disconnect
7.dispose
7.distort
The shortest way to determine which of those strings has been presented might look like:
if (reader.Current == "1")
{
// the word is "disconnect"
}
else
{
reader.MoveForward(5);
if (reader.Current == "p")
{
// the word is "dispose"
}
else
{
// the word is "distort"
}
}
The question has 2 parts, though I hope someone can just point me at the right algorithm or facet of information theory that I need to read more about.
1) Given a finite list of strings, what is the best way to generate logic that requires the least number of seeks & comparisons, on average, to determine which word was presented?
2) As with the first, but allowing weighting such that hotpaths can be accounted for. i.e. if the word "distort" is 4 times more likely than the words "disconnect" and "dispose", the logic shown above would be more performant on average if structured as:
reader.MoveForward(5);
if (reader.Current == "t")
{
// the word is distort
}
else //...
Note: I'm aware that the 6th character in the example set is unique so all you need to do to solve the example set is switch
on that character, but please assume there is a longer list of words.
Also, this isn't some homework assignment - I'm writing a parser/interception layer for the Guacamole protocol. I've looked at Binary Trees, Tries, Ulam's Game, and a few others, but none of those fit my requirements.