2

I have a Trie data structure, it searches 100 000 elements in a blink of an eye. However it only searches words which start with the searched string, for example "Fi" will find "Final" but won't find "GooFi" and I want it to return "GooFi" too. That is why I am here asking you guys if this is the correct structure in this case. I implemented it myself, wrote unit tests so it's working so far. What I need is a hint how my goal can be achieved, I don't want anyone to write the code for me, that is not why I am here. Anyways here's my search implementation:

public List<string> Seach(string word)
{
    List<string> results = new List<string>();
    this.DoSearch(this.Root, 0, new StringBuilder(word), results);
    return results;
}

private void DoSearch(TrieNode currentNode, int currentPositionInWord, StringBuilder word, List<string> results)
{
    if (currentPositionInWord >= word.Length)
    {
        this.DfsForAllWords(currentNode, word, results);
        return;
    }

    char currentChar = word[currentPositionInWord];

    bool containsKey = currentNode.Children.ContainsKey(currentChar);
    if (containsKey)
    {
        if (currentPositionInWord == word.Length - 1)
        {
            results.Add(word.ToString());
        }

        TrieNode child = currentNode.Children[currentChar];
        this.DoSearch(child, ++currentPositionInWord, word, results);
    }
}

private void DfsForAllWords(TrieNode currentNode, StringBuilder word, List<string> results)
{
    foreach (var node in currentNode.Children.ToArray())
    {
        word.Append(node.Value.Value);
        if (node.Value.IsWord)
        {
            results.Add(word.ToString());
        }

        this.DfsForAllWords(node.Value, word, results);
        word.Length--;
    }
}

Any help is greatly appreciated.

Georgi-it
  • 3,676
  • 1
  • 20
  • 23
  • 1
    How about - http://stackoverflow.com/questions/7600292/high-performance-contains-search-in-list-of-strings-in-c-sharp?rq=1 – Vadim Aug 05 '13 at 17:20
  • 2
    For searching for a substring a trie gives no benefit over a plain list or array, as you have to check all strings. For some ideas: http://stackoverflow.com/questions/1299168/fast-filtering-of-a-string-collection-by-substring/1299211#1299211 – Guffa Aug 05 '13 at 17:23
  • What about suffix tree then? – Georgi-it Aug 05 '13 at 20:58

2 Answers2

1

You can use a kind of index over all nodes.

Dictionary<char,List<TrieNode>> nodeIndex;

Now if you want to search for example for "Fi" iterate over nodeIndex and search as before. In case you found something on that iteration you have to prepend the found substrings with the string leading to the actual node.

public List<string> Seach(string word)
{
    List<string> results = new List<string>();

    foreach(var node in nodeIndex[word[0]])
    {
        List<string> nodeResults = new List<string>();

        this.DoSearch(node, 0, new StringBuilder(word), nodeResults);

        foreach(var nodeResult in nodeResults)
        {
            var text = string.Format("{0}{1}",node.Parent.Text, nodeResult);
            results.Add(node.Parent.Text, nodeResult);
        }
    }

    return results.Distinct().ToList();
}

Maybe there are some yet missing properties you have to implement.

abto
  • 1,583
  • 1
  • 12
  • 31
  • Thank you for posting, this came to my mind however it seems that this will waste memory as one of my goals is not to waste memory since I am going to work with a lot of data. Anyways I modified the Trie to search in a substring. I am going to post my code later so more people can benefit from it. – Georgi-it Nov 02 '13 at 16:31
0

https://github.com/gngeorgiev/Trie

Here is the repo of the Trie if anyone ever needs it. Supports prefix and substring search.

Georgi-it
  • 3,676
  • 1
  • 20
  • 23