Hash Table Implementation - alternatives to collision detection

Question

Other than collision detection and throwing a LinkedList in a hashtable, what are some other ways that a Hash Table can be implemented? Is collision detection the only way to achieve an efficient hash table?

score 4 · Accepted Answer · edited May 23 '17 at 11:52

4

Ultimately a finite sized hash table is going to have collisions, at least any generally programmed one. If your key is type string then the hash table has an infinite number of possible keys, but with a hash table, you have just a finite number of buckets. So fundamentally there has to be collisions. If you were to implement a hash table where it ignores collisions, then you would have a very strange, indeterministic data structure that would appear to remove elements at random.

Now, the data structure used on the backend doesn't have to be a linked list. You could implement it as a red-black tree and get log(n) performance out of a collision. You should checkout the article 5 Myths About Hash Tables and also this Stack Overflow question about HashMaps vs Maps.

Now, if you know something about you key type, say the key is a 2 character long string, then there are only a finite number of possible keys, you can then proceed to create a "hash" function that converts the key to a relatively small integer, you could create a look-up table that is guaranteed to not have collisions.

It is important to note that a well-implemented hash table will not suffer very much from collisions. There are bigger problems in the world like world hunger (or even how to implement an efficient hash function) than the computer having to traverse three nodes in a linked list once every 5 days.

edited May 23 '17 at 11:52

Community

1
1

answered Sep 26 '15 at 00:10

Joshua Rahm

534
2
5

1

You could also use probing, which (while still a method of dealing with collisions) doesn't require any other structure for the bucket implementation (since each bucket will only store one item). Naturally, you'll have to resize the hash map each time you run out of space. – cyphar Sep 26 '15 at 00:24
True I forgot about probing lol. I think I remember that strategy being popo'd by my data structures professor because it is not any faster than a linked list and also fills up the hash table faster. (It's better for memory complexity I guess) – Joshua Rahm Sep 26 '15 at 00:27
Oh, definitely. Probing is not efficient *at all* (it has linear lookup time and a fixed limit of the amount of entries), but it does fulfil the "alternative implementation of hashtables that doesn't use linked lists" requirement. :P – cyphar Sep 26 '15 at 00:30
@cypher: not so... probing can be *extremely* efficient: especially for small values that can fit directly in the table, and when compared to tables of pointers (in)to linked lists, it's often an order of magnitude faster for insertions and searches. The significant issues with it include deletions, which are hard to perform in O(1)-ish time without allowing the performance of all future operations to degrade, and rehashes to increase capacity, which inherently have to move all the values (much as a `vector`, whereas `std::unordered_X` doesn't move value during a rehash). – Tony Delroy Sep 28 '15 at 08:24

score 0 · Answer 2 · answered Sep 28 '15 at 08:39

Other than collision detection and throwing a LinkedList in a hashtable, what are some other ways that a Hash Table can be implemented?

Other ways include:

having another container type linked from the nodes where elements have collided, such as a balanced binary tree or vector/array
GCC's hash table underpinning std::unordered_X uses a single singly-linked list of values, and a contiguous array of buckets container iterators into the list; that's got some great characteristics including optimal iteration speed regardless of the current load_factor()
using open addressing / closed hashing, which - when an insert/find/erase finds another key in the bucket it has hashed to, uses some algorithm to find another bucket to look in instead (and so on until it finds the key, a deleted element it can insert over, or an unused bucket); there are a number of options for this kind of "probing", the simplest being a try-the-next-bucket approach, another being quadratic 1, 4, 9, 16..., another the use of alternative hash functions.
perfect hash functions (below)

Is collision detection the only way to achieve an efficient hash table?

sometimes it's possible to find a perfect hash function that won't have collisions, but that's generally only true for very limited input sets, whether due to the nature of the inputs (e.g. month and year of birth of living people only has order-of a thousand possible values), or because a small number are known at compile time (e.g. a set of 200 keywords for a compiler).

Hash Table Implementation - alternatives to collision detection

2 Answers2