2

I know that searching for a given prefix in a trie is in O(M) where M is the maximum length of any word inserted into the trie.

But what is the time-complexity of retrieving all elements that start with a specific prefix?

I thought about a possible answer:

O(M+n) where n is the number of words starting with the prefix. The idea: Searching for the prefix is in O(M). Then I have a subtrie that contains all words starting with the given prefix and I only have to traverse it. Problem (maybe): There are more nodes than words in a prefix tree. But maybe there is some form of efficient storing so that I don't have to look at them?

flackbash
  • 488
  • 2
  • 16

1 Answers1

0

Imagine you have a trie where all words begin with the same prefix of length L. Now, reporting all words beginning with that prefix requires you to do a complete search on the trie, which will take time O(n), where n is the total number of nodes in the trie.

The challenge here is that searching for all words in a trie starting with some prefix might require you to search a number of nodes unrelated to the length of the prefix. It depends what nodes are in the trie.

If you know you're going to do searches like this often, you might want to consider using a Patricia trie (also called a radix trie). This is a trie where each edge can be labeled with multiple characters and nodes with one child and one parent aren't allowed. If you store the trie this way, then you can show that the time required to visit all the nodes corresponding to words in a particular subtrie is O(z), where z is the number of words found. The reason this works is that every non-leaf node in a Patricia trie has at least two children, so a subtrie containing z leaf nodes will have z internal nodes (it's a good exercise to check this), so you can discover all the leaves after just scanning O(z) nodes. The problem is that this will find the nodes for the words rather than listing off the words themselves. If you're okay with that, this can be a huge time saver.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • Thanks for the answer. Is there a way to get an upper bound on the number of nodes in the tree where all words begin with the same prefix? So that I can express the complexity for example in terms of the length of the longest word in the trie and the size of the alphabet or the number of words in the trie or something? – flackbash Nov 01 '16 at 08:02