3

Concerning Trie from Wikipedia:

[Compared to HashTable] Tries support ordered iteration

I am not sure what is meant here first of all. Is it the same as sorted iteration?
Additionally is this supposed to be an inherent feature of this datastructure?
I mean if one uses e.g. a HashSet for the children of each node in a Trie we can get O(1) access trying to find the children to branch or by using a LinkedList you save space on the nodes.
Perhaps I am way wrong but from my point of view the only way to support ordered iteration would be to keep array of all keys per node even unused.
Isn't this approach bad?
And one last point:
If the ordered here is related to insertion order (and not sorted) how would we get that since we insert each word (using the chars as keys) to the corresponding node but I can't see how this gives us information on the insertion order?
Could someone help me clear these things in my mind out?
Thank you.

Samuel Edwin Ward
  • 6,526
  • 3
  • 34
  • 62
Cratylus
  • 52,998
  • 69
  • 209
  • 339
  • Just a guess. But ordered iteration of strings in a trie would mean iteration of each string in the trie according to the alphabetical order. If the trie contains bat sat cat and hat, you can easily iterate over the trie and get those strings in alphabetical order. E.g. bat, cat, hat sat. – Justin May 24 '12 at 15:32
  • Yes I understand that.But if this must be provide as a feature of Trie then the implementation is not supposed to do optimizations for space (using `LinkedList`) or time (using `HashSet`) for storing children? – Cratylus May 24 '12 at 15:38
  • As long as those structures are ordered then they could he used to store children while holding that invariant. I would think a character to node map would suffice for a child data structure or a sorted linked list. – Justin May 24 '12 at 15:43
  • I am not clear on your approach "character to node map".Would it be possible to give a small sample code on what you mean? – Cratylus May 24 '12 at 15:47
  • I was think in terms of a HashMap in Java data structure, where the key is a character and the value is the node who's value is the character. – Justin May 24 '12 at 15:51
  • This `HashMap` would reresent the children? – Cratylus May 24 '12 at 15:55
  • Yea, it would contain the children (node in my example) – Justin May 24 '12 at 15:57
  • `HashMap` is not an ordered data structure though – Cratylus May 24 '12 at 16:02
  • Yea, you are right but a TreeMap in Java allows ordered key iteration. Keys are stored using a red-black tree, I believe. – Justin May 24 '12 at 16:05
  • Yes but then you agree that ordered iteration "imposes" implementation specific requirements. Right? I would have to used a `TreeMap` to implement this then – Cratylus May 24 '12 at 16:15
  • Yea, I agree. I'm not sure it is a requirement of a trie but it seems like a common trait of most implementations. – Justin May 24 '12 at 16:22

2 Answers2

3

What they mean is that a depth-first search of the trie will yield a lexicographically ordered output of the strings in the trie.

But yes, you are right, this assumes that all sibling nodes on a given level are visited in lexicographical order and this is far from given, especially with large alphabets, where it makes sense to implement the child node table via a hash table.

In summary, I think your doubts are justified and that the Wikipedia article is wrong.

But it’s worth noting that a lexicographically ordered iteration over a trie is affordable even if the child nodes aren’t ordered, since ordering them in while iterating is expected to be relatively cheap – there will be few child nodes in each trie node so the overall performance of iterating the trie will still be in O(n) expected time – in contrast with a hash table, where ordered iteration effectively implies sorting all elements, an O(n log n) operation.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
0

It means sorted. You can't get the insertion order out a trie without extra effort. Not exactly sure what you mean by inherent feature. The implementation needs to provide it (or provide access to the internals), but it is straight-forward to do so.

Stefan Haustein
  • 18,427
  • 3
  • 36
  • 51
  • 1
    But if it is an inherent feature then these means that "optimizations" like using a `HashSet` or `LinkedList` to store the children are out of the question? – Cratylus May 24 '12 at 15:35
  • I don't see how you could optimize using a linked list. When using hash sets at the nodes, you could still iterate over all "potential" buckets and get performance similar to a regular Trie – Stefan Haustein May 24 '12 at 15:38
  • 1
    Using a `LinkedList` to store the children saves space as I don't have to have e.g. an `array[256]` to store characters that will not be present in input text. Same reasoning for `HashSet` in terms of time – Cratylus May 24 '12 at 15:41
  • But what would that be good for -- you'll have linear access performance at each level... That would be worse than using balanced regular trees in every aspect. – Stefan Haustein May 24 '12 at 15:43
  • It is not linear access! It is `O(K)` where `K` is the size of the alphabet which is a constant e.g. `O(256)`=`O(1)` – Cratylus May 24 '12 at 15:46
  • Ok, but I'd argue that this is still too big to be be ignored, in particular given that you still need to multiply this with the maximum word length M (also assumed to be constant). If we assume M=20 you get to O(5120) which is still constant but in practice not something you'd ignore. – Stefan Haustein May 24 '12 at 16:07
  • Well, I am far from expert in asymptotic notation but I don't think that `O(5120` evaluates as a "hidden" large constant especially in the input `text` is in the order of 2 million strings – Cratylus May 24 '12 at 16:17
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/11701/discussion-between-stefan-haustein-and-user384706) – Stefan Haustein May 24 '12 at 16:19