0

I was looking at this StackOverflow answer to understand hashing better and saw the following (regarding the fact that we would need to get bucket size in constant time):

if you use something like linear probing or double hashing, finding all the items that hashed to the same value means you need to hash the value, then walk through the "chain" of non-empty items in your table to find how many of those hashed to the same value. That's not linear on the number of items that hashed to the same value though--it's linear on the number of items that hashed to the same or a colliding value.

What does this mean that it's "linear on the number of items that hashed to the same or a colliding value"? Wouldn't it be linear on total number of items in the hashtable, since it's possible that it will need to walk through every value during linear probing? I don't see why it would just have to go through the ones that collided.

Like for example, if I am using linear probing (step size 1) on a hashtable and I have different keys (not colliding, all hash to unique values) mapping to the odd index slots 1,3,5,7,9..... Then, I want to insert many keys that all hash to 2, so I fill up all my even index spots with those keys. If I wanted to know how many keys hash to 2, wouldn't I need to go through the entire hash table? But I'm not just iterating through items that hashed to the same or colliding value, since the odd index slots are not colliding.

rb612
  • 5,280
  • 3
  • 30
  • 68
  • `If I wanted to know how many keys hash to 2` There is no reason for wanting to know this. It is also not the *purpose* of a hash table. In any case: [with linear chaining] for an *almost full* hashtable, having only one empty spot, adding one element will always fill this empty spot, whatever its key value (or hashed key value) will be. – joop Apr 16 '18 at 09:48
  • @joop I apologize, I think I didn't phrase my question correctly. In the referenced StackOverflow article, they talk about how there must be a `bucket_size()` operation that runs in linear time. To do this from my understanding, I would need to know how many keys hash to 2 by walking through the collision chain. What I don't see, however, is how this isn't just linear on the number of elements, but it's "linear on the number of items that hashed to the same or a colliding value". I suppose I'm not clear on what "colliding value" means in this context, given that hashing to same val = colliding – rb612 Apr 17 '18 at 01:08

1 Answers1

1

A hash table is conceptually similar to an array (table) of linked lists (bucket in the table). The difference is in how you manage and access that array: using a function to generate a number that is used to compute the array index.

Once you have two elements placed in the same bucket (the same computed value, i.e. collission), then the problem turns out to be a search in a list. The number of elements in the list is hopefully lower than the total elements in the hash table (meaning that other elements exist in other buckets).

However, you are skipping the important introduction in that paragraph:

If you use something like linear probing or double hashing, finding all the items that hashed to the same value means you need to hash the value, then walk through the "chain" of non-empty items in your table to find how many of those hashed to the same value. That's not linear on the number of items that hashed to the same value though -- it's linear on the number of items that hashed to the same or a colliding value.

Linear probing is a different implementation of a hash table in which you don't use any list (chain) for your collissions. Instead, you just find the nearest available spot in the array, starting from the expected position and going forward. The more populated the array is, the higher the chance is to find that the next position is being used too, so you just need to keep searching. The positions are used by items that hashed to the same or colliding value, although you are never (and you don't really care) which of these two cases is, unless you explicitly see the hash of the existing element there.

This CppCon presentation video makes a good introduction and in-depth analysis of hash tables.

Jorge Bellon
  • 2,901
  • 15
  • 25
  • Thank you! I think I understand most of that - I suppose I'm just confused mostly why it's linear on "the number of items that hashed to the same or a colliding value." I don't really understand how hashing to a colliding value would be different from hashing to the same value. – rb612 Apr 16 '18 at 08:48
  • The point of searching in the list after hashing is because you may find values that **are not** what you are looking for. If all the values were the same, you would just return the first (constant). Since not all the elements are the same, you have to look for them. [This picture](https://en.wikipedia.org/wiki/Hash_function#/media/File:Hash_table_4_1_1_0_0_1_0_LL.svg) shows how *John Smith* and *Sandra Dee* evaluate to the same hash value, so when searching you would need to check which of those is what you have in the table. They share the same value so they will be in the same list. – Jorge Bellon Apr 16 '18 at 09:53
  • That does make sense, thank you for your helpful responses, although it just quite isn't what I'm getting at. From my understanding OP in the post I referenced wants to return the amount of items hashing to a given value using linear probing. OP says that this operation is "linear on the number of items that hashed to the same or a colliding value" But I don't understand what OP means about also having to check colliding values. – rb612 Apr 17 '18 at 01:06
  • Oh, I see now. [**Linear probing**](https://en.wikipedia.org/wiki/Linear_probing) is the case in which you don't use a list for colliding elements. You only make use of the array. The hash result is used as an index, and from that point on you just need to find for an empty position in that array. If there are many positions occupied by others, you keep searching: the more populated the array is, the more it takes to find a *hole* for the new value. – Jorge Bellon Apr 17 '18 at 08:02
  • I've edited the answer to add that point. Sorry to miss it before! – Jorge Bellon Apr 17 '18 at 08:09
  • No problem - thank you for adding that. My main question is what do you mean exactly by "items that hashed to the same or colliding value". What does the "same or colliding value" mean? When we hash to the same value, that's a collision, so what is the "or colliding value" part mean? That's what I'm confused about from both answers. – rb612 Apr 17 '18 at 16:42
  • Maps and sets are organized around keys, which is the element you want to search for (in *John Smith* example, its the name and surname). The same value would be another *John Smith*, and a colliding value would be a different value that produces the same hash (*Sandra Dee*). The result of the hash is always a number, which is used to guess in which position the key/value will be placed. – Jorge Bellon Apr 18 '18 at 08:15