15

We all know that a hash table has O(1) time for both inserts and look-ups if the hash function was well chosen. So, what are the reason we want to use Binary Search Tree? Just because a perfect hash function was hard to design?

Here how I come up with this question? I notice that Standard C++ STL has set and map which are implemented with Binary search tree ,but has no hash (not talking about non-stardard hash_set , hash_map). While, Ruby has only Hash. I want to understand the rational behind this difference.

KLE
  • 23,689
  • 4
  • 56
  • 62
pierrotlefou
  • 39,805
  • 37
  • 135
  • 175

8 Answers8

25

Trees allow in-order traversion.

The worst case performance for a hash table is O(N) (linear search through one bucket), a binary search is bound by O(log N).

NB: this requires the tree to be balanced - that's why typical implementation use a self-balancing tree, suhc as a red-black tree.

While such a degradation is unlikely, it is not impossible and depends strongly on the ability to chose an appropriate hash function and the distribution of the actual data.

A tree implementation also grows trivially to the required size, whereas a hashmap starts to degrade when it gets full (for most implementations, it's said around 70% of the buckets filled). You either need to rehash the entire table (again, bad fo real time apps), or incrementally move to a new table, which is not a simple implementation.

In the end, STL probably just went with one "base" container template, the tree, to avoid the additional implementation complexity.

peterchen
  • 40,917
  • 20
  • 104
  • 186
  • 1
    A binary tree can be 100% unbalanced which means it takes the shape of a linked list. This means that its worst case performance is *O(n)*. – Björn Lindqvist Nov 04 '16 at 11:30
  • 1
    @BjörnLindqvist: True - that's why tree-based containers typically use a self-balancing tree, such as a red-black-tree (https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) – peterchen Nov 04 '16 at 13:19
9

To add on peterchen answer, hash structures although theoretically faster at insertion and removal depend vastly on the actual data, the chosen hash function and the amount of data.

  • A perfect hash function depends on the amount and distribution of the data.

Having large performance variations between best and worst cases makes them unfit for general purpose structures. Binary trees on the other hand are more predictable independently of the amount/type of data used, even though less efficient on best case scenario.

Vasco Fernandes
  • 688
  • 1
  • 4
  • 11
6

The STL didn't initially include a hash table among the containers as hash tables are more complex - you have to choose between open and closed addressing, not to mention the hash function, etc. At the time, Stepanov and Stroustrup were trying to speed up progress on it so that it was quickly accepted into the standard.

Trees on the other hand, are relatively simpler. It was already known that since these are in-memory data structures, we can just use a binary tree instead of a B-tree. Then it was a choice between AVL and RB trees. RB trees tend to be chosen due to better performance characteristics which I am not in a position to comment on, but the Wikipedia articles on both structures (AVL and RB) will tell you more in relatively good detail.

Otherwise, trees and hash tables are good for different things. If you need fast inserts or retrievals, and couldn't care about the order they are stored in, hash tables are good. If you need ordering characteristics and strong guarantees on inserts and retrievals, then binary trees are good. Another good rule of thumb is to profile. Since most uses of either would be interface compatible, profiling to see which gives you better performance helps too.

blwy10
  • 4,862
  • 2
  • 24
  • 23
3

You can access the data in a binary search tree in order.

Stephen Denne
  • 36,219
  • 10
  • 45
  • 60
1

Well search trees are ordered, hashes are not.

Nifle
  • 11,745
  • 10
  • 75
  • 100
1

To use a tree you need a way to order items in the tree. To use a hash table you need a function to compute the hash value of an item in the hash table.

Interestingly the .NET framework requires every class to implement (or inherit) the GetHashCode function enabling every object to be stored in a hash table. However, this also adds an additional burden on developers that are required to implement semantically correct hash functions even if they don't intend the class to be hashed. One solution is to return a constant value from GetHashCode which is semantically correct, but not very efficient should the function ever be used for hashing.

Martin Liversage
  • 104,481
  • 22
  • 209
  • 256
1

If you can get away with it, you should always prefer a hash over a binary search tree. Hashes has higher memory overhead than trees, but all the memory they are using can be allocated in one big block. For trees, each node added requires a separate allocation which causes high fragmentation and is bad for performance. Similar to how you would much rather read 1000 bytes from 1 file than 1 byte from 1000 different files.

The case in which hashes doesn't work is when ordering matters. For example, suppose you are writing a memory allocator and you store free blocks of memory in a data structure. Keys are the sizes of the blocks and the values are the pointers to them.

A request for memory entails looking through this data structure and finding the smallest (implies ordering!) block satisfying the request. For example if you have blocks with keys 10, 20, 30 and a request for 20 bytes of memory comes in, you select the second block. A hashmap can do that easily.

But what if the request is for 22 bytes? Since there is no key with the value 20, you have to iterate the whole hashmap to find the right key (30) which is an O(n) operation. But if you had used a tree, then to "find the smallest key larger than a given key" is an O(log n) operation.

Björn Lindqvist
  • 19,221
  • 20
  • 87
  • 122
0

At the time of C++ people were still fans of hard-core academic approach to data structures and algorithms, so they preferred structures with smaller memory footprint and well-understood best- and worst- case behavior.

By the time Ruby appeared, and for the purposes of scripting, people realized they favor simplicity over raw performance, and since hashtables allow semantics of both arrays (if you use sequential index as the key) AND of dictionaries (if you use natural key), they were deemed as more universal data structure.

Andriy Volkov
  • 18,653
  • 9
  • 68
  • 83