1

I am currently reading the CLRS book and looking at how hashtables are defined there, when talking about open hashing and using chaining for collision resolution there is a piece of text that reads:

If the hash table supports deletion, then its linked lists should be doubly linked
so that we can delete an item quickly. If the lists were only singly linked, then to
delete element x, we would first have to find x in the list T[h(x.key)] so that we
could update the next attribute of x’s predecessor. With singly linked lists, both
deletion and searching would have the same asymptotic running times.)

What I dont understand is why this is the case (or more specifically how), this is not a question about why deleting from a linked list when you know the element is faster if its a doubly linked list, I thought I should make that clear...

This is related to how it is possible to actually know the location, and so a doubly linked list can make this difference because looking at the hash delete pseudocode (below) the key is used to generate the hash which leads to the correct array index of where the linked list can be found but how exactly can that be translated into the actual position in the linked list of the item?

CHAINED-HASH-DELETE(T, x)
1 delete x from the list T[h(x.key)]

It looks to me like hashing the key leads to the linked list only, so in either case the list would still need to be searched to find the actual value currently being deleted?

Im sure there is a simple answer but it is just not clear to me!

berimbolo
  • 3,319
  • 8
  • 43
  • 78

1 Answers1

1

It looks to me like hashing the key leads to the linked list only, so in either case the list would still need to be searched to find the actual value currently being deleted?

That's right, when someone's asking to erase(key) there's no benefit from a doubly-linked list as it is easy to erase while traversing even a singly-linked list.

Still, it's quite common to have code like (pseudo-code)...

if (iterator i = hash_table.find(key)) {
    do_something_with(*i);
    if (should_erase(*i))
        erase(i);
}

Here, the iterator i can be a pointer to a node in the doubly-linked list, so each dereferencing operation *i to access the element/value associated with that node doesn't have to repeat any part of the hash table bucket lookup or doubly-linked list search.

If it then decides to erase(i), the iterator identifies the node to erase without needing another search.

If you only used a singly-linked list, then for erase(iterator) to avoid re-searching, it would need to store a pointer to the previous element's next pointer (which is actually what GCC's C++ hash table containers do). If it only stores that (to keep iterators smaller), then to access the element the iterator logically addresses it has to first lookup the previous element's next pointer, then follow that to the logically addressed node: that's not super efficient. If it stores the address of the logically-addressed element and the previous element (or more specifically its next pointer), then iterators become larger objects, also with potential memory usage and performance impact. Still, you're saving a "previous" pointer per link in the list, and there's more likely to be lots of elements than lots of iterators.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • do you think this is what they are getting at in the book? It is only describing the delete process itself, I completely get your answer and that makes sense, but if all we are doing is call delete on an object from the hashtable I dont understand why they suggest that a linked list will ensure the item can be deleted in constant time, that implies the list does not need to be searched, but a pointer to the actual item is somehow supplied and `next` and `previous` can just be updated, like when deleting from doubly linked list. – berimbolo Dec 31 '20 at 11:21
  • *"deleted in constant time, that implies the list does not need to be searched"* - deletion happens in *amortised* constant time - O(1) - because the average length of these lists is constant, since 1) the max load factor is constant (i.e. the hash table will be resized if #elements/#buckets passes a threshold, 2) the distribution of keys across buckets should be fairly even (see *Linked List Length Relates To Load Factor* in my answer [here](https://stackoverflow.com/a/30567466/410767) for some stats). Summarily, typical list length doesn't trend up with #elements, so no big-O impact. – Tony Delroy Dec 31 '20 at 15:38
  • Thanks again, so does it actually make any difference if the list is linked or not then when regarding the deletion, as they imply? I still think, like your answer suggests (I think) that its no. – berimbolo Dec 31 '20 at 17:20
  • @berimbolo: we normally talk about singly-linked (`struct Node { Node* p_next; T value; };` and doubly-linked (`struct Node { Node* p_next, p_prev; T value; };`) lists, rather than "linked or not". Lists have to be linked (i.e. embed a `Node*`) unless you implement a list-like concept in e.g. an array where the next node is implied, and that doesn't scale (in the big-O sense) for efficient operations on large datasets. But, it's no harder or slower to implement a `erase(const Key&)` function on a singly-linked list compared to a doubly-linked one. – Tony Delroy Dec 31 '20 at 19:23
  • 1
    That said, the text you quoted from CLRS contains *"delete element x, we would first have to find x in the list T[h(x.key)]"* - it's horribly worded, but when they say "element x" there they don't mean the element *by value* (i.e. a independent element object with the key that should be found and deleted), they mean if you have a pointer/reference/iterator to the specific element object x *already stored in the hash table*. I can infer that because I understand the operations involved and *only then* is the rest of their paragraph correct, but the wording is actively misleading and confusing. – Tony Delroy Dec 31 '20 at 19:31
  • Sorry I missed the word "doubly linked", just reread my comment – berimbolo Jan 01 '21 at 00:13
  • 1
    Actually now I look again at the delete function, given your explanation its a lot clearer, of course there is already a reference to the node because it accesses `x.key`, thanks a lot for the explanation and additional information, the book is very difficult to get to grips with, especially for a so called 'gentle introduction', happy new year! – berimbolo Jan 01 '21 at 00:19
  • @berimbolo Happy New Year! – Tony Delroy Jan 01 '21 at 03:30