6

Just trying to understand the linear probing logic.

With as hashtable using open addressing, how can you ever confirm that an element isn't in the table.

For example, say you had a 10 bucket hashmap. Suppose you hash a key, and insert it. Now, if element A and B are to be inserted and hash and reduce to the same bucket then element A and B if using a linear probe will likely be next to each other.

Now, just because a bucket is empty, does not seem to mean that an element doesn't exist in the map. e.g. You search for element B after element A is first removed. First you get an empty bucket where you expect B to be, but you need to probe one more and you'll find B. It's really is there. If that logic is correct then won't you have to search the entire table to confirm whether a key is not there? i.e. O(n) performance everytime.

What I'm saying is, don't you need to go through the whole map to truly confirm it's not there?

With a separate chaining approach to hashmap's that problem doesn't exist.

For example: enter image description here

If John Smith is deleted, and we try to locate Sandra Dee.

Or with linear addressing are you meant to move elements around so that there are no holes as such. i.e. If John Smith is deleted should elements from 152 to 154 be shifted back one place? It don't really see that in the description but that might make some sense. Except if ted baker hashed to 154 and not 153 as described. Requires a bit more work than I thought.

Might just go with a simple linked list at each bucket.

hookenz
  • 36,432
  • 45
  • 177
  • 286
  • 1
    Found a great tutorial on hash tables. http://research.cs.vt.edu/AVresearch/hashing/ – hookenz Jun 14 '11 at 22:56
  • If you want to efficiently test if a value is not in a table, you might take a look at a Bloom filter: https://en.wikipedia.org/wiki/Bloom_filter – Alex Reynolds Jun 10 '21 at 19:53

3 Answers3

4

In the absolute worst case, yes the algorithm to determine whether or not some item is in the table is O(n). However, this will never happen in a properly managed hash table.

When an item is removed, a tombstone should be placed at the table slot that it was removed from. The tombstone is simply some data to indicate that there used to be an element there, but it has been removed. In this way, every time you search for an element, you must follow whatever probe sequence you are using until you find a slot that is empty. If a slot is empty you have completed the probe sequence for that hash value and know that it will not be at any other place in the table.

The only way that you would have to search through every slot in the probe sequence is if there are no empty slots in the probe sequence. Since you should always aim for a hash table to be about half empty, this should not happen.

Deddryk
  • 123
  • 4
3

Use of a hashtable that resolves conflicts with a probing strategy places stringent requirements on the deletion capabilities of the data structure. When you delete an item, you must compensate the hashtable so that it still meets the requirements needed for search (which is the main point of a hashtable).

Using linear probing, if you delete an item, you move to the next slot. If it matches the hash for the slot we just deleted, move it. Rinse and repeat until you get to an empty slot. There are also lazy deletion strategies as well, where items are marked for deletion, and then actually deleted/compensated on the next search.

Assume three items {A0, A1, A2} that have the same Hash value. Let {A0,A1} be in the Hashtable and {A2} is not. If we delete A0, we mark it for deletion. When we do a search for A2 (not in the Hashtable) we find A0 (deleted), we then move to A1 which we relocate into A0's slot, finalizing the deletion. We move to the next slot (which might be A2, or a candidate for the slot A1 was just occupying) but find it its empty so we clear out the slot that A1 just vacated and our hashtable is back to a pristine state.

Ethan Cabiac
  • 4,943
  • 20
  • 36
  • I was meaning that if you use an array for a bucket implementation and that the bucket itself held the key/value as in open addressing. – hookenz Jun 14 '11 at 03:56
  • @Matt - got it. By using the term bucket, I assumed you were using a hashtable that supports collisions. I see you are using a non-colliding hashtable with probing for conflict resolution. – Ethan Cabiac Jun 14 '11 at 04:34
  • Thanks, I believe you're right. Both answers to this question are right. I like the tombstone approach in my case. – hookenz Jun 14 '11 at 22:50
  • @EthanCabiac Thanks for the explanation. I was searching this and offered a bounty for the same. If you can please answer my question which needs a slight more detail, I will accept it and reward the bounty to you. Please find the link of the question here. http://stackoverflow.com/questions/10855989/what-is-the-cost-of-deleting-a-value-from-a-hashtable – dharam Jun 06 '12 at 03:48
0

I think this is late, but there is a mention to the probe sequence and the maximum probe sequence for a hash table.

Every time you insert an element, you really keep a track that what is the maximum number of probes you already did in the past, and is that 'maxProb' smaller than the number of probes you did for inserting the current element.

Eventually when you are searching an element in the hashtable, you will only perform a maximum of maxProb searches.

Now considering that you do not allow infinite or N probes, where N is the capacity of the hashtable, the running time would be O(x) where x is the maximum allowed probe in the worst case.

Assume that your hash key generation algorithm is such that it is forcing you to probe many times, then you can implement the data structure in such a way, that if an insertion takes x probes, it should consider to rehash itself.

dharam
  • 7,882
  • 15
  • 65
  • 93