126

What are the advantages of binary search trees over hash tables?

Hash tables can look up any element in Theta(1) time and it is just as easy to add an element....but I'm not sure of the advantages going the other way around.

hippietrail
  • 15,848
  • 18
  • 99
  • 158
Devoted
  • 177,705
  • 43
  • 90
  • 110
  • for hash tables what are the running times for find() insert() and remove()? theta(1) theta(1) and theta(1) right? – Devoted Nov 08 '10 at 22:13
  • 9
    Almost always, yes. If you run into a lot of collisions, then those times might grow up to O(n). – Christian Mann Nov 08 '10 at 22:16
  • 1
    These times also depend on your hashing function. If for some strange reason it's not O(1), obviously your operations will have a minimum bound of whatever efficiency your hash function runs at. – Christian Mann Nov 09 '10 at 02:36
  • I would say biggest advantages of BST is it is in a sorted data structure. Detail use case already listed [here](http://stackoverflow.com/questions/5010854/concrete-examples-of-using-binary-search-trees). – Yuantao Feb 16 '15 at 21:41

18 Answers18

158

One advantage that no one else has pointed out is that binary search tree allows you to do range searches efficiently.

In order to illustrate my idea, I want to make an extreme case. Say you want to get all the elements whose keys are between 0 to 5000. And actually there is only one such element and 10000 other elements whose keys are not in the range. BST can do range searches quite efficiently since it does not search a subtree which is impossible to have the answer.

While, how can you do range searches in a hash table? You either need to iterate every bucket space, which is O(n), or you have to look for whether each of 1,2,3,4... up to 5000 exists. (what about the keys between 0 and 5000 are an infinite set? for example keys can be decimals)

Alex
  • 2,915
  • 5
  • 28
  • 38
  • 14
    BSTs do range searches efficiently! For me this is the best answer it terms of practical and algorithmic approach. – ady Dec 11 '13 at 10:51
  • 7
    wow this really explains why trees are so associated with databases; their benefits are most visible when you need to perform key based filtering. with hash maps, you need to loop over all keys to solve "find all items with key between 1000 and 3290" – Dmytro Mar 30 '18 at 13:25
  • Now I know what range search is and how a BST will outperform a Hash Table implementation. – Saad Aug 20 '22 at 04:20
  • @Dmyto Set intersection of indexes is extremely fast thanks to the sort-merge-join algorithm that steps to the max of the lower-bounds of each index. This is an inner join, where fast lower-bound is essential. – Samuel Danielson Nov 28 '22 at 01:31
105

Remember that Binary Search Trees (reference-based) are memory-efficient. They do not reserve more memory than they need to.

For instance, if a hash function has a range R(h) = 0...100, then you need to allocate an array of 100 (pointers-to) elements, even if you are just hashing 20 elements. If you were to use a binary search tree to store the same information, you would only allocate as much space as you needed, as well as some metadata about links.

Christian Mann
  • 8,030
  • 5
  • 41
  • 51
  • 40
    It is not true that the entire range of hash function outputs have to exist in the array. The hash values can simply be modded by the length of the array to allow a smaller array. Of course, the ultimate number of elements being added may not be known, so the hash table may still allocate more space than is necessary. Binary search trees can waste just as much memory or more, though. Linked implementations need space for at least two additional pointers per element (three if using a parent pointer), and array-based BST's can waste a lot of memory for unfilled portions of the tree. – Solaraeus Jul 11 '12 at 22:28
  • 5
    @Solaraeus: Array-based BST's are the best to compare to hash tables and they are no more wasteful than hash tables. You can also expand a BST with little more than a memory copy, compared to recomputing the whole table. – Guvante Sep 20 '12 at 18:11
  • @Guvante: BSTs being "no more wasteful" means that memory efficientcy is not a *dis*advantage of BSTs, but it is not an advantage either. The question is about what are the advantages. – Chris Dodd Sep 11 '22 at 02:12
  • @ChrisDodd: My comment was just that in this context linked list BSTs aren't useful to compare against. Also array based BSTs are "load factor" (50% isn't uncommon) more space efficient as hash tables need a significant number of empty spaces to avoid becoming a linear search due to collisions. – Guvante Sep 13 '22 at 17:26
83

One "advantage" of a binary tree is that it may be traversed to list off all elements in order. This is not impossible with a Hash table but is not a normal operation one design into a hashed structure.

NealB
  • 16,670
  • 2
  • 39
  • 60
59

In addition to all the other good comments:

Hash tables in general have better cache behavior requiring less memory reads compared to a binary tree. For a hash table you normally only incur a single read before you have access to a reference holding your data. The binary tree, if it is a balanced variant, requires something in the order of k * lg(n) memory reads for some constant k.

On the other hand, if an enemy knows your hash-function the enemy can enforce your hash table to make collisions, greatly hampering its performance. The workaround is to choose the hash-function randomly from a family, but a BST does not have this disadvantage. Also, when the hash table pressure grows too much, you often tend to enlargen and reallocate the hash table which may be an expensive operation. The BST has simpler behavior here and does not tend to suddenly allocate a lot of data and do a rehashing operation.

Trees tend to be the ultimate average data structure. They can act as lists, can easily be split for parallel operation, have fast removal, insertion and lookup on the order of O(lg n). They do nothing particularly well, but they don't have any excessively bad behavior either.

Finally, BSTs are much easier to implement in (pure) functional languages compared to hash-tables and they do not require destructive updates to be implemented (the persistence argument by Pascal above).

I GIVE CRAP ANSWERS
  • 18,739
  • 3
  • 42
  • 47
  • 3
    `BSTs are much easier to implement in (pure) functional languages compared to hash-tables` - really? I want to learn a functional language now! – nawfal Jun 13 '14 at 09:51
  • 1
    The hash table needs to be persistent in a functional language. This often complicates implementations. – I GIVE CRAP ANSWERS Jun 13 '14 at 20:13
  • to elaborate, if you make president data structures in functional languages, all you really end up doing is writing the same code you would in assembly, except in each operation you explicitly transform your array of memory/registers, or talk to a server to pretend to do that. Im all for being aware of your state but it's isomorphic to the imperative approach if done correctly (you can't realistically copy a large amount of data on each transformation in real life, you need to cheat). – Dmytro Mar 30 '18 at 13:34
35

The main advantages of a binary tree over a hash table is that the binary tree gives you two additional operations you can't do (easily, quickly) with a hash table

  • find the element closest to (not necessarily equal to) some arbitrary key value (or closest above/below)

  • iterate through the contents of the tree in sorted order

The two are connected -- the binary tree keeps its contents in a sorted order, so things that require that sorted order are easy to do.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • BST finds the closest match, only if the exact match doesn't exist, right? What if you find an exact match at the root itself? – developer747 Sep 06 '15 at 00:13
  • 3
    @developer747: Then the next closest below and above are the rightmost leaf of the left subtree and the leftmost leaf of the right subtree. – Chris Dodd Sep 08 '15 at 04:01
16

A (balanced) binary search tree also has the advantage that its asymptotic complexity is actually an upper bound, while the "constant" times for hash tables are amortized times: If you have a unsuitable hash function, you could end up degrading to linear time, rather than constant.

jamesnvc
  • 1,023
  • 6
  • 13
  • 3
    To drive this point home, a degenerate case is when the collection contains many copies of just 1 key. in the BST, insert is O(log n), in a Hash table, insert is O(n) – SingleNegationElimination Nov 09 '10 at 00:07
  • 3
    When a hash table contains many copies of just 1 key, insert is (still) O(1), not O(n). The problem for hash tables is when there are many *different* keys with the same hash. This can be avoided by a dynamic hash scheme that switches to a different hash function when there are many collisions. – Chris Dodd Sep 14 '15 at 21:27
  • 1
    Note than an unbalanced tree can degenerate into a list and also have O(n) lookup. – awiebe Jan 02 '18 at 04:19
10

A hashtable would take up more space when it is first created - it will have available slots for the elements that are yet to be inserted (whether or not they are ever inserted), a binary search tree will only be as big as it needs to be. Also, when a hash-table needs more room, expanding to another structure could be time-consuming, but that might depend on the implementation.

FrustratedWithFormsDesigner
  • 26,726
  • 31
  • 139
  • 202
10

A binary tree is slower to search and insert into, but has the very nice feature of the infix traversal which essentially means that you can iterate through the nodes of the tree in a sorted order.

Iterating through the entries of a hash table just doesn't make a lot of sense because they are all scattered in memory.

Blagovest Buyukliev
  • 42,498
  • 14
  • 94
  • 130
9

A binary search tree can be implemented with a persistent interface, where a new tree is returned but the old tree continues to exist. Implemented carefully, the old and new trees shares most of their nodes. You cannot do this with a standard hash table.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
6

BSTs also provide the "findPredecessor" and "findSuccessor" operations (To find the next smallest and next largest elements) in O(logn) time, which might also be very handy operations. Hash Table can't provide in that time efficiency.

Balaji
  • 123
  • 1
  • 7
  • 1
    If you are looking for "findPredecessor" and "findSuccessor" operations, then HashTable is a bad choice for data structure in the first place. – A.K.Desai Jan 23 '17 at 11:27
6

From Cracking the Coding Interview, 6th Edition

We can implement the hash table with a balanced binary search tree (BST) . This gives us an O(log n) lookup time. The advantage of this is potentially using less space, since we no longer allocate a large array. We can also iterate through the keys in order, which can be useful sometimes.

Guy Kahlon
  • 4,510
  • 4
  • 30
  • 41
3

GCC C++ case study

Let's also get some insight from one of the most important implementations in the world. As we will see, it actually matches out theory perfectly!

As shown at What is the underlying data structure of a STL set in C++?, in GCC 6.4:

  • std::map uses BST
  • std::unordered_map uses hashmap

So this already points out to the fact that you can't transverse a hashmap efficiently, which is perhaps the main advantage of a BST.

And then, I also benchmarked insertion times in hash map vs BST vs heap at Heap vs Binary Search Tree (BST) which clearly highlights the key performance characteristics:

  • BST insertion is O(log), hashmap is O(1). And in this particular implementation, hashmap is almost always faster than BST, even for relatively small sizes

  • hashmap, although much faster in general, has some extremely slow insertions visible as single points in the zoomed out plot.

    These happen when the implementation decides that it is time to increase its size, and it needs to be copied over to a larger one.

    In more precise terms, this is because only its amortized complexity is O(1), not the worst case, which is actually O(n) during the array copy.

    This might make hashmaps inadequate for certain real-time applications, where you need stronger time guarantees.

enter image description here

Related:

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
1

It also depends on the use, Hash allows to locate exact match. If you want to query for a range then BST is the choice. Suppose you have a lots of data e1, e2, e3 ..... en.

With hash table you can locate any element in constant time.

If you want to find range values greater than e41 and less than e8, BST can quickly find that.

The key thing is the hash function used to avoid a collision. Of course, we cannot totally avoid a collision, in which case we resort to chaining or other methods. This makes retrieval no longer constant time in worst cases.

Once full, hash table has to increase its bucket size and copy over all the elements again. This is an additional cost not present over BST.

sreeprasad
  • 3,242
  • 3
  • 27
  • 33
1

If you want to access the data in a sorted manner, then a sorted list has to be maintained in parallel to the hash table. A good example is Dictionary in .Net. (see http://msdn.microsoft.com/en-us/library/3fcwy8h6.aspx).

This has the side-effect of not only slowing inserts, but it consumes a larger amount of memory than a b-tree.

Further, since a b-tree is sorted, it is simple to find ranges of results, or to perform unions or merges.

IamIC
  • 17,747
  • 20
  • 91
  • 154
1

Binary search trees are good choice to implement dictionary if the keys have some total order (keys are comparable) defined on them and you want to preserve the order information.

As BST preserves the order information, it provides you with four additional dynamic set operations that cannot be performed (efficiently) using hash tables. These operations are:

  1. Maximum
  2. Minimum
  3. Successor
  4. Predecessor

All these operations like every BST operation have time complexity of O(H). Additionally all the stored keys remain sorted in the BST thus enabling you to get the sorted sequence of keys just by traversing the tree in in-order.

In summary if all you want is operations insert, delete and remove then hash table is unbeatable (most of the time) in performance. But if you want any or all the operations listed above you should use a BST, preferably a self-balancing BST.

mightyWOZ
  • 7,946
  • 3
  • 29
  • 46
0

Hash Tables are not good for indexing. When you are searching for a range, BSTs are better. That's the reason why most database indexes use B+ trees instead of Hash Tables

Dmytro
  • 5,068
  • 4
  • 39
  • 50
ssD
  • 339
  • 2
  • 9
  • 22
  • databases indexes are of both types hash and B+ trees. When you want to do comparison like greater than or less than , then B+ trees index is useful otherwise hash Index is useful for lookup. Also think of when data is not comparable and if u want to create index, then db will create hash index and not B+ tree index. @ssD – Sukhmeet Singh Sep 27 '18 at 12:33
  • Can you provide sources for that "better" claim? – Nico Haase Jan 06 '21 at 09:25
0

A hashmap is a set associative array. So, your array of input values gets pooled into buckets. In an open addressing scheme, you have a pointer to a bucket, and each time you add a new value into a bucket, you find out where in the bucket there are free spaces. There are a few ways to do this- you start at the beginning of the bucket and increment the pointer each time and test whether its occupied. This is called linear probing. Then, you can do a binary search like add, where you double the difference between the beginning of the bucket and where you double up or back down each time you are searching for a free space. This is called quadratic probing. OK. Now the problems in both these methods is that if the bucket overflows into the next buckets address, then you need to-

  1. Double each buckets size- malloc(N buckets)/change the hash function- Time required: depends on malloc implementation
  2. Transfer/Copy each of the earlier buckets data into the new buckets data. This is an O(N) operation where N represents the whole data

OK. but if you use a linkedlist there shouldn't be such a problem right? Yes, In linked lists you don't have this problem. Considering each bucket to begin with a linked list, and if you have 100 elements in a bucket it requires you to traverse those 100 elements to reach the end of the linkedlist hence the List.add(Element E) will take time to-

  1. Hash the element to a bucket- Normal as in all implementations
  2. Take time to find the last element in said bucket- O(N) operation.

The advantage of the linkedlist implementation is that you don't need the memory allocation operation and O(N) transfer/copy of all buckets as in the case of the open addressing implementation.

So, the way to minimize the O(N) operation is to convert the implementation to that of a Binary Search Tree where find operations are O(log(N)) and you add the element in its position based on it's value. The added feature of a BST is that it comes sorted!

Sᴀᴍ Onᴇᴌᴀ
  • 8,218
  • 8
  • 36
  • 58
0

Binary search trees can be faster when used with string keys. Especially when strings are long.

Binary search trees using comparisons for less/greater which are fast for strings (when they are not equal). So a BST can quickly answer when a string is not found. When it's found it will need to do only one full comparison.

In a hash table. You need to calculate the hash of the string and this means you need to go through all bytes at least once to compute the hash. Then again, when a matching entry is found.

Calmarius
  • 18,570
  • 18
  • 110
  • 157