83

I remotely remember that tries don't store the whole data per node, only the suffix to the parent node.

Where trees do store the whole data, but only organize themselves based on a prefix.

So tries get smaller, which allows for example to compress dictionaries very well.

Is that really the only difference?

From actual applications I remember that tries are faster in range queries. There are even special solr/lucene trie fields to speed up range queries. But how is that so?

What is the actual difference and what are the advantages and disadvantages of tries and trees?

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
The Surrican
  • 29,118
  • 24
  • 122
  • 168

4 Answers4

102

A tree is a general structure of recursive nodes. There are many types of trees. Popular ones are binary tree and balanced tree. A Trie is a kind of tree, known by many names including prefix tree, digital search tree, and retrieval tree (hence the name 'trie').

Each kind of tree has a different purpose, structure and behaviour. For example, a binary tree stores a collection of comparable items (eg numbers). It can therefore be used to store a set of numbers, or to index other data that can be represented by numbers (eg objects that can be hashed). Its structure is sorted so it can be searched quickly to find a single item. Other tree structures, such as a balanced tree, are similar in principle.

A trie represents a sequence in its structure. It is very different in that it stores sequences of values rather than individual single values. Each level of recursion says 'what is the value of item I of the input list'. This is different to a binary tree which compares the single searched value to each node.

WEFX
  • 8,298
  • 8
  • 66
  • 102
Joe
  • 46,419
  • 33
  • 155
  • 245
  • Isn't trie like sort of lame? Doesn't binary tree beat trie in every respect except for storage space? – Pacerier Sep 04 '16 at 00:01
  • There's a place for every data structure. what about finding all strings with the same prefix? O(n) access? – Joe Sep 04 '16 at 15:35
  • 2
    Wouldn't tree do that too? Let's have 1 billion entries, finding prefix of 20. Trie does it in 20 steps. Tree does it in lg 1B / lg 2 = 30 steps. Now with the same 1B entries, we find prefix of 40. Tree still does it in 30 steps, but trie does it in 40. With prefix of 100, trie now takes 100 steps while tree still takes 30. – Pacerier Sep 07 '16 at 15:11
  • 1
    Wait what, to me Tries seem if anything better. Since lookups in a trie are `O(m)` in the length of the string, whereas in a tree they are `O(m * log n)` where n is the number of nodes, since comparing two strings takes worst case `O(m)` time, and you need to do worst case `O(log n)` comparisons. Whereas Tries get to do `O(1)` character comparison. – semicolon Jun 08 '17 at 15:36
  • 2
    @Pacerier Trie's have better asymptotics for basically all operations, since comparisons in a Trie are `O(1)` because you are comparing a constant size sub-part of the value, whereas they are `O(m)` in a Tree. – semicolon Jun 29 '17 at 15:09
8

A binary tree or a bst is typically used to store numerical values. The time complexity in a bst is O(log(n)) for insertion, deletion and searching. Each node in a binary tree has at most 2 child nodes.

Trie : Every node of trie consists of multiple branches. Each branch represents a possible character of keys. We need to mark the last node of every key as leaf node. A trie node field value will be used to distinguish the node as leaf node (there are other uses of the value field)

To learn about tries refer this topcoder tutorial. https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/

Harsh Gupta
  • 1,433
  • 1
  • 10
  • 7
8

In tree structure, we store whole words. There is a lot of overhead. If you search for antic, you would traverse each word and compare all the strings in antic. This takes up too much time and memory:

enter image description here

However, in tries, compression is the key. We store only the common prefixes. this will reduce the reduncancy.

enter image description here

Pros of tries:

  • The search time only depends on the length of the searched string.

  • Search misses only involve examining a few characters (in particular, just the longest common prefix between the search string and the corpus stored in the tree).

  • There are no collisions of unique keys in a trie.

  • There is no need to provide a hash function or to change hash functions as more keys are added to a trie.

  • A trie can provide an alphabetical ordering of the entries by key.

unfortunately, there is no perfect data structure. So even tries do have some downsides:

  • Tries can be slower than hash tables at looking up data whenever a container is too big to fit in memory. Hash tables would need fewer disk accesses, even down to a single access, while a trie would require O(m) disk reads for a string of length m.

  • Hash tables are usually allocated in a single big and contiguous chunk of memory, while trie nodes can span the whole heap. So, the former would better exploit the principle of locality.

  • A trie’s ideal use case is storing text strings. We could, in theory, stringify any value, from numbers to objects, and store it. Yet, if we were to store floating-point numbers, for instance, there are some edge cases that can produce long meaningless paths, such as periodic or transcendent numbers, or results of certain floating points operations such as 0.1+0.2, due to issues with double-precision representation.

  • Tries have memory overhead for nodes and references.

Use tries when you have to frequently perform prefix searches ( longestPrefix or keysWithPrefix ). Use hash tables when data is stored on slow supports like disk or whenever memory locality is important.

enter image description here

Reference

Yilmaz
  • 35,338
  • 10
  • 157
  • 202
5

Just got some insights from this talk, even through the Radix tree used in linux kernel is slight different to the one on wikipedia.

Trees only store keys, they don't store the value associated with the keys.

firo
  • 1,002
  • 12
  • 20