2

This is the controversial line from Cracking the Coding Interview on hash tables.

Another common implementation(besides linked-list) for a hash table is to use a BST as the underlying data structure.

I know this question has been asked before... it's so confusing because everyone is giving two different answers. For example

Why implement a Hashtable with a Binary Search Tree?

The highest voted answer in this post says that the quoted statement above is saying talking about a hash table implementation using a binary search tree, without an underlying array. I understood that since each element inserted gets a hash value (an integer), the elements form a total order (every pair can be compared with < and >). Therefore, we can simply use a binary search tree to hold the elements of the hash table.

On the other hand, others say

Hash table - implementing with Binary Search Tree

the book is saying that we should handle collisions with a binary search tree. So there is an underlying array and when collisions because multiple elements get the same hash value and get placed in the same slot in the array, that's where the BST comes in.

So each slot in the array will be a pointer to a BST, which holds elements with the same hash value.

I'm leaning towards the second post's argument because the first post does not really explain how such implementation of a hash table can handle collisions. And I don't think it can achieve expected O(1) insert/delete/lookup time.

But for the second post, if we have multiple elements that get the same hash value and placed in a BST, I'm not sure how these elements are ordered (how can they be compared against each other?)

Please, help me put an end to this question once and for all!

namesake22
  • 345
  • 2
  • 12

1 Answers1

3

the first post does not really explain how such implementation of a hash table can handle collisions

With a BST, you can use a hashing function that would produce no duplicate keys so there would be no collisions. The advantage here isn't speed but to reduce memory consumption, and to have better worst-case guarantees. If you're writing software for a critical real-time system, you might not be able to tolerate a O(n) resizing of your hash table.

if we have multiple elements that get the same hash value and placed in a BST, I'm not sure how these elements are ordered (how can they be compared against each other?)

Rehash with another function.

In the end, it all depends on what your data structure is used for (Is memory vs. speed more important? Is amortized performance vs worst-case performance more important? etc.)

Eric
  • 730
  • 4
  • 16
  • About your comments on the first blockquote... a hash function that doesn't produce any duplicat values is a perfect hash function right? If we use PHF and a BST for this implementation, what are some disadvantages? there has to be some drawback to it – namesake22 Jun 21 '17 at 01:39
  • can you also explain why the BST + PHF implementation would cause a O(n) resizing time? – namesake22 Jun 21 '17 at 01:59
  • 1
    PHF + BST has the disadvantage of being slower since insertion and search would be O(log n) vs. a Array-based hash table which has O(1). In C++, std::map uses a BST with PHF implementation while std::unordered_map uses an array, and std::map is almost always slower than std::unordered_map (except in the case where std::map is empty). The BST + PHF doesn't have O(n) resizing, it's the array + BST in each bucket to handle collisions that has to resize in O(n), usually when the number of buckets in the array reaches 50%. – Eric Jun 21 '17 at 12:08