25

I am trying to understand the open addressing method. I refer to T. H. Cormen's book on this topic, which states that deletion is difficult in open addressing. I am completely stuck at this paragraph:

Deletion from an open-address hash table is difficult. When we delete a key from slot i, we cannot simply mark that slot as empty by storing NIL in it. Doing so might make it impossible to retrieve any key k during whose insertion we had probed slot i and found it occupied.

I don't understand this. Please explain it with some examples.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Nishant Kumar
  • 5,995
  • 19
  • 69
  • 95

3 Answers3

65

Assume hash(x) = hash(y) = hash(z) = i. And assume x was inserted first, then y and then z.
In open addressing: table[i] = x, table[i+1] = y, table[i+2] = z.

Now, assume you want to delete x, and set it back to NULL.

When later you will search for z, you will find that hash(z) = i and table[i] = NULL, and you will return a wrong answer: z is not in the table.

To overcome this, you need to set table[i] with a special marker indicating to the search function to keep looking at index i+1, because there might be element there which its hash is also i.

amit
  • 175,853
  • 27
  • 231
  • 333
  • Amit: Can you please explain this: Three techniques are commonly used to compute the probe sequences required for open addressing: linear probing, quadratic probing, and double hashing. These techniques all guarantee that h(k, 1), h(k, 2), ., h(k, m) is a permutation of 0, 1, . . . , m - 1 for each key k. None of these techniques fulfills the assumption of uniform hashing, however, since none of them is capable of generating more than m2 different probe sequences (instead of the m! that uniform hashing requires). – Nishant Kumar Feb 03 '12 at 11:41
  • 1
    @Amit if you assuming that hash(x) = hash(y) = hash(z) = i , then how can it be represented by open addressing . In that case you you require to maintain a linked list(or some other kind of chaining) of the records right ? – Geek Aug 17 '12 at 12:53
  • @amit Also I have a [related question here](http://stackoverflow.com/questions/12005405/are-open-addressing-in-hash-tables-only-useful-for-searching-how-do-the-elemen) . can you try to answer that ? – Geek Aug 17 '12 at 12:55
  • 1
    @Geek: In "open addressing", if hash(x) = hash(y) = i, and assume x was first inserted to index i. Then, when you insert y - you insert it to a different entree. The simplest (Though not best) way to do it is: `insert(y,position)=if (position is occupied) insert(y,position+1); else table[position]=y` – amit Aug 17 '12 at 20:14
  • its not that difficult in coalescing hashing, you can set value to null, but keep the `next` pointer to next index, in coalescing approach its like open addressing approach but there is `next` property which indicates next position in table. – M.kazem Akhgary Sep 03 '17 at 07:03
13

Deletion from a linear probed open addressed hash table is simple. There was pseudo code for it on the Wikipedia Hash Table page for years. I don't know why is isn't there any more, but here is a permalink back to when it was: Old Wikipedia Hash Table page, and here for your convenience is the pseudocode:

function remove(key)
 i := find_slot(key)
 if slot[i] is unoccupied
     return   // key is not in the table
 j := i
 loop
     j := (j+1) modulo num_slots
     if slot[j] is unoccupied
         exit loop
     k := hash(slot[j].key) modulo num_slots
     if (j > i and (k <= i or k > j)) or
        (j < i and (k <= i and k > j)) (note 2)
         slot[i] := slot[j]
         i := j
 mark slot[i] as unoccupied

There is also a ref on that page to some real code. I believe this has exactly the same performance characteristic as insertion.

This method of deletion is better than the much used 'mark deleted and occasionally rehash everything' because the above method is constant time rather than amortized constant time. If you have a hash table of a million items you are adding and deleting from, in the 'mark deleted' method, an occasional add or delete is going to take a million times longer than the ones before and after it - which is not a good performance characteristic.

hoopsnake
  • 131
  • 1
  • 3
  • Correct me if I'm wrong, but wouldn't it be simpler to just get a hash of key and then compare if hashes of key and slot[j] are equal? What I would do is just find the last occupied slot (starting from j = i + 1) with the same hash as key and then move it into the i-th slot. You have a single move operation and compute a hash for at most num_collisions times - complexity that is the same as for contains(key) and insert(key) operations. – recodeFuture Jun 17 '18 at 08:23
  • 1
    When all slots are either used or marked, searching for an unexisting element will last forever, or at least walk through whole table before finally concluding that it does not exist. This must happen if the program runs for a time long enough. – Meow Cat 2012 Aug 05 '18 at 05:14
  • I don't see why it would last forever. You first find the next empty slot, from there you backwards search for an entry that might hash to i. If found (slot j), you put that in the ith slot. You then repeat this procedure for slot j, until you donot find a match. Sure, worst case you scan everything, but assuming the load factor is low enough, this should perform in O(1) time. – john16384 Nov 08 '18 at 14:14
12

In an open addressing scheme, lookups invoke a series of probes until either the key is found or and empty slot is found.

If one key involves a chain of several probes, it will be lost (not findable) if somewhere along the chain, one of the other keys is removed, leaving an empty slot where a stepping stone was needed.

The usual solution is to delete a key by marking its slot as available-for-resuse-but-not-actually empty. In other words, a replacement stepping stone is added so that probe chains to other keys aren't cut short.

Hope this helps your understanding.

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485