14

I need to know if a ternary tree is better than a hash table.

I came across this question in a reply to another question I had where someone said that ternary trees are often faster than hash tables. I found that hard to believe, so I decided to research a little into it.

This one website from Princeton appears to be the source of the belief. I took a look at algorithm which is described to be O(log n + k) where n is the number of words stored, and k is the length of the key.

It seems to me that the only way this could be faster is if you are often searching for elements that are not already stored. Another thing that bothers me, is that the non-continuous crawling of a trie would tend to cause you to hit pages that have been swapped out, but whether this is a major effect could only be seen through benchmarks.

Now I know that there are probably pros and cons to both of them, and if so, I want to know what they are. Benchmarks are also helpful.

Community
  • 1
  • 1
Unknown
  • 45,913
  • 27
  • 138
  • 182
  • Given the context, you almost certainly want to compare hash tables with *purely functional* trees. Imperative balanced trees can be phrased in terms of arrays so they are a lot more performant than their purely functional counterparts (which is all that is available in Haskell). – J D May 06 '09 at 23:33

5 Answers5

7

Here is what I gather from the Dr. Dobbs Article reachable from the Princeton link you gave:

  1. Ternary Search Trees are up to 10% faster than hash tables on some search problems. They are sometimes slower - depending vastly on the machine used.
  2. TRTs are a custom data structure tuned by two of the finest minds of Computer Science - Jon Bentley and Robert Sedgewick both wrote good textbooks, and have done their share of practical programming. Hash tables are considered run-of-the-mill.
  3. The constants involved are significant, as Hao Wooi Lin says.
  4. Overall, this depends on the problem you are solving. The faster development time and almost ubiquitous support for hash tables in many programming languages are often more important than a ten-percent improvement in run time.
Yuval F
  • 20,565
  • 5
  • 44
  • 69
4

The only way to answer this question is empirically. The answer depends on the details of your implementation, what kinds of searches you do, what hardware you have, and what compiler you're using. You can copy the C code from Princeton. If you want to try a functional language, Standard ML has hash tables (look at SML/NJ), and here is some ML for ternary search trees:

type key = Key.ord_key
type item = Key.ord_key list

datatype set = NODE of { key : key, lt : set, eq : set, gt : set }
             | LEAF

val empty = LEAF

fun member (_, LEAF) = false
  | member (h::t, NODE {key, lt, eq, gt}) =
      (case Key.compare (h, key)
         of EQUAL   => member(t, eq)
          | LESS    => member(h::t, lt)
          | GREATER => member(h::t, gt))
  | member ([], NODE {key, lt, eq, gt}) =
      (case Key.compare (Key.sentinel, key)
         of EQUAL   => true
          | LESS    => member([], lt)
          | GREATER => member([], gt))

exception AlreadyPresent

fun insert(h::t, LEAF) =
      NODE { key = h, eq = insert(t, LEAF), lt = LEAF, gt = LEAF }
  | insert([], LEAF) =
      NODE { key = Key.sentinel, eq = LEAF, lt = LEAF, gt = LEAF }
  | insert(h::t, NODE {key, lt, eq, gt}) =
      (case Key.compare (h, key)
         of EQUAL   => NODE {key = key, lt = lt, gt = gt, eq = insert(t, eq)}
          | LESS    => NODE {key = key, lt = insert(h::t, lt), gt = gt, eq = eq}
          | GREATER => NODE {key = key, lt = lt, gt = insert(h::t, gt), eq = eq})
  | insert([], NODE {key, lt, eq, gt}) =
      (case Key.compare (Key.sentinel, key)
         of EQUAL   => raise AlreadyPresent
          | LESS    => NODE {key = key, lt = insert([], lt), gt = gt, eq = eq}
          | GREATER => NODE {key = key, lt = lt, gt = insert([], gt), eq = eq})

fun add(l, n) = insert(l, n) handle AlreadyPresent => n
Norman Ramsey
  • 198,648
  • 61
  • 360
  • 533
  • Open addressed hash tables are typically 3-18x faster than closed addressed ones and are the default hash table implementation on .NET as a consequence. However, to the best of my knowledge none of today's implementations of OCaml, Standard ML or Haskell are even capable of expressing this data structure. – J D Mar 20 '10 at 03:20
  • 1
    @Jon: If you mean by "open addressing" what's meant in wikipedia (each bucket holds a value rather than a pointer to a separate chain), this is easy to implement in any of the ML dialects. What's a bit tedious and can't be expressed in ML is that you probably want the array elements to be *unboxed*---an important consideration for efficiency. Unboxed arrays can't be expressed in standard Haskell but can be expressed using a GHC extension. However I'm not expert in how this feature interacts with mutablity (using the IO monad). – Norman Ramsey Mar 20 '10 at 18:25
  • Mutability in the context of unboxed arrays isn't so much the problem as perf bugs in GHC's garbage collector. They recently added a workaround for one long-standing bug but writing a single element is still O(n) and filling a hash table is, therefore, still O(n^2) in Haskell. Yesteryear's FPL implementations really suck at this. If you want to compare with hash tables objectively you must avoid them. – J D Mar 21 '10 at 04:10
  • 2
    @Jon: How utterly depressing. Which library module should I use to expose the bug? If filling a hash table is O(n^2), language implementors should be publically shamed. Ironically, Objective Caml hash tables were O(n^2) for *years* because of a bug in the dynamic growth of the table as it filled. You wouldn't believe how much squawking it took to get this bug fixed---I guess Xavier did not like my patch. Amazing that simple data structures are hard to get right. – Norman Ramsey Mar 21 '10 at 18:17
  • Wow, that is depressing. Sounds like Haskell is going through the same growing pains now. GHC 6.10's Data.Hashtable exhibits this bug. Look at the malloc-based (!) k-nucleotide solution from the shootout for a workaround. One of the things I've really liked about F# is that Microsoft take on-board all of my advice and either act on it or give a good reason not to. I didn't have any trouble getting hash tables right in HLVM so these PL experts have no excuse! Incidentally, is there any chance of a new OSS FPL implementation that solves these kinds of problems as F# has done? – J D Mar 24 '10 at 01:56
1

log(n) grows slowly engough so it can often require a huge amount of data before it's slower than an O(1) algorithm when taking the constant factor into account.

erikkallen
  • 33,800
  • 13
  • 85
  • 120
  • 2
    Well its not __that__ huge. If you take O(1) to be O(k) where k is the length of the key, then if you have k = 10, it will only take 1025 elements for a log(n) binary tree to be slower. For a ternary branching tree, it is approximately 60,000, which is large, but not large enough to not happen. – Unknown May 05 '09 at 08:22
  • @Unknown: You're assuming the constant factors are equal but they are not. In practice, hash tables are faster than trees even at much smaller sizes than that. For example, with F# on .NET 4 here a purely functional `Set` is only faster than the .NET `HashSet` for sets with <3 elements. – J D Apr 21 '11 at 10:06
0

This is pretty intriguing to me as well. But from the wiki I read, it claimed that ternary Tree is faster at most search problems. This is not surprising, because even though hash table has O(1) lookup, you still need time doing the hashing. So it's not really O(1) but more like O(k) where k do not depend on N (number of items in the data structure). This may gives the impression that Hash Table is faster. However, if you're dealing with a large structures, the k quickly adds up and there will come a point where searching speed of Hash Tables becomes slower than Ternary Tree.

Hao Wooi Lim
  • 3,928
  • 4
  • 29
  • 35
  • But the problem is that ternary trees are also dependent on k, especially if the element is in the tree. Often times, I use hash tables where N (number of items) is __much__ larger than k (length of key). – Unknown May 05 '09 at 08:09
0

You may have a look at tstdb: http://code.google.com/p/tstdb/

This kv-store is based on Ternary Search Tree, and it is compatible with Memcached. More over, tstdb supports prefix searching and range query which are facilitated by Ternary Search Tree.

user531097
  • 75
  • 1
  • 3