4

I am developing a Trie data-structure where each node represents a word. So words st, stack, stackoverflow and overflow will be arranged as

root
--st
---stack
-----stackoverflow
--overflow

My Trie uses a HashTable internally so all node lookup will take constant time. Following is the algorithm I came up to insert an item into the trie.

  1. Check item existence in the trie. If exist, return, else goto step2.
  2. Iterate each character in the key and check for the existence of the word. Do this until we get a node where the new value can be added as child. If no node found, it will be added under root node.
  3. After insertion, rearrange the siblings of the node under which the new node was inserted. This will walk through all the siblings and compare against the newly inserted node. If any of the node starts with same characters that new node have, it will be moved from there and added as child of new node.

I am not sure that this is the correct way of implementing a trie. Any suggestions or improvements are welcome.

Language used : C++

Navaneeth K N
  • 15,295
  • 38
  • 126
  • 184
  • @Vladimir: I meant `Trie` not `Tree`. I rolled back your changes. – Navaneeth K N Feb 11 '10 at 15:08
  • I have a Python trie implementation in this answer: http://stackoverflow.com/questions/55210/algorithm-to-generate-anagrams/1924561#1924561 – FogleBird Feb 12 '10 at 03:30
  • I have a doubt, Now if I jut want to retrieve or show word stack in suggestion box, how trie will differentiate between stack and stackoverflow? Please help. – vaishali33 Jan 08 '15 at 17:32

2 Answers2

7

The trie should look like this

                      ROOT
             overflow/    \st
                    O      O
                            \ack
                             O
                              \overflow
                               O

Normally you don't need to use hash tables as part of a trie; the trie itself is already an efficient index data structure. Of course you can do that.

But anyway, your step (2) should actually descend the trie during the search and not just query the hash function. In this way you find the insertion point readily and don't need to search for it later as a separate step.

I believe step (3) is wrong, you don't need to rearrange a trie and as a matter of fact you shouldn't be able to because it's only the additional string fragments that you store in the trie; see the picture above.

Antti Huima
  • 25,136
  • 3
  • 52
  • 71
  • Thanks. I know standard `trie` looks like how you explained. But I need each node to represent a whole word and not just suffixes. Descending trie to find the location to insert seems to be decent idea. Thanks again. – Navaneeth K N Feb 11 '10 at 15:06
  • It sounds like what you want is a HAT-trie. Look it up. – Justin Peel Feb 11 '10 at 15:15
  • @Appu: if your nodes have a parent link you can always get the full word pack even when the paths only contain "suffixes" – Ritsaert Hornstra Feb 11 '10 at 18:56
1

Following is the java code for insert algorithm.

public void insert(String s){
  Node current = root; 
  if(s.length()==0) //For an empty character
   current.marker=true;
  for(int i=0;i<s.length();i++){
   Node child = current.subNode(s.charAt(i));
   if(child!=null){ 
    current = child;
   }
   else{
    current.child.add(new Node(s.charAt(i)));
    current = current.subNode(s.charAt(i));
   }
   // Set marker to indicate end of the word
   if(i==s.length()-1)
    current.marker = true;
  } 
 } 

For a more detailed tutorial, refer here.

bragboy
  • 34,892
  • 30
  • 114
  • 171