126

I've heard in my degree classes that a HashTable will place a new entry into the 'next available' bucket if the new Key entry collides with another.

How would the HashTable still return the correct Value if this collision occurs when calling for one back with the collision key?

I'm assuming that the Keys are String type and the hashCode() returns the default generated by say Java.

If I implement my own hashing function and use it as part of a look-up table (i.e. a HashMap or Dictionary), what strategies exist for dealing with collisions?

I've even seen notes relating to prime numbers! Information not so clear from Google search.

nazia
  • 75
  • 2
  • 9
Alex
  • 4,844
  • 7
  • 44
  • 58

10 Answers10

108

Hash tables deal with collisions in one of two ways.

Option 1: By having each bucket contain a linked list of elements that are hashed to that bucket. This is why a bad hash function can make lookups in hash tables very slow.

Option 2: If the hash table entries are all full then the hash table can increase the number of buckets that it has and then redistribute all the elements in the table. The hash function returns an integer and the hash table has to take the result of the hash function and mod it against the size of the table that way it can be sure it will get to bucket. So by increasing the size, it will rehash and run the modulo calculations which if you are lucky might send the objects to different buckets.

Java uses both option 1 and 2 in its hash table implementations.

nazia
  • 75
  • 2
  • 9
ams
  • 60,316
  • 68
  • 200
  • 288
  • 1
    In the case of the first option, is there any reason a linked list is used instead of an array or even a binary search tree? –  May 05 '13 at 05:28
  • 1
    the above explanation is high level, I don't think it makes much difference as to linked list vs. array. I think a binary search tree would be overkill. Also I think if you dig into things like ConcurrentHashMap and others there any many low level implement details that can make a performance difference, that the high level explanation above does not account for. – ams May 06 '13 at 14:11
  • 5
    If chaining is used, when given a key, how do we know which item to get back? – ChaoSXDemon Oct 10 '15 at 04:04
  • 2
    @ChaoSXDemon you can traverse the list in the chain by key, duplicate keys are not the issue the issue is two different keys having the same has hashcode. – ams Oct 12 '15 at 01:32
  • 1
    @ams : Which one is preferred ? is There any limit for Hash collision, after which 2nd point is executed by JAVA ? – Shashank Vivek Nov 17 '17 at 12:42
91

When you talked about "Hash Table will place a new entry into the 'next available' bucket if the new Key entry collides with another.", you are talking about the Open addressing strategy of Collision resolution of hash table.


There are several strategies for hash table to resolve collision.

First kind of big method require that the keys (or pointers to them) be stored in the table, together with the associated values, which further includes:

  • Separate chaining

enter image description here

  • Open addressing

enter image description here

  • Coalesced hashing
  • Cuckoo hashing
  • Robin Hood hashing
  • 2-choice hashing
  • Hopscotch hashing

Another important method to handle collision is by Dynamic resizing, which further has several ways:

  • Resizing by copying all entries
  • Incremental resizing
  • Monotonic keys

EDIT: the above are borrowed from wiki_hash_table, where you should go to have a look to get more info.

herohuyongtao
  • 49,413
  • 29
  • 133
  • 174
  • 3
    "[...] requires that the keys (or pointers to them) be stored in the table, together with the associated values". Thanks, this is the point that is not always immediately clear when reading about mechanisms for storing values. – mtone Jun 28 '15 at 21:54
34

There are multiple techniques available to handle collision. I will explain some of them

Chaining: In chaining we use array indexes to store the values. If hash code of second value also points to the same index then we replace that index value with an linked list and all values pointing to that index are stored in the linked list and actual array index points to the head of the the linked list. But if there is only one hash code pointing to an index of array then the value is directly stored in that index. Same logic is applied while retrieving the values. This is used in Java HashMap/Hashtable to avoid collisions.

Linear probing: This technique is used when we have more index in the table than the values to be stored. Linear probing technique works on the concept of keep incrementing until you find an empty slot. The pseudo code looks like this:

index = h(k) 

while( val(index) is occupied) 

index = (index+1) mod n

Double hashing technique: In this technique we use two hashing functions h1(k) and h2(k). If the slot at h1(k) is occupied then the second hashing function h2(k) used to increment the index. The pseudo-code looks like this:

index = h1(k)

while( val(index) is occupied)

index = (index + h2(k)) mod n

Linear probing and double hashing techniques are part of open addressing technique and it can only be used if available slots are more than the number of items to be added. It takes less memory than chaining because there is no extra structure used here but its slow because of lot of movement happen until we find an empty slot. Also in open addressing technique when an item is removed from a slot we put an tombstone to indicate that the item is removed from here that is why its empty.

For more information see this site.

rassar
  • 5,412
  • 3
  • 25
  • 41
Jatinder Pal
  • 719
  • 8
  • 3
18

I strongly suggest you to read this blog post which appeared on HackerNews recently: How HashMap works in Java

In short, the answer is

What will happen if two different HashMap key objects have same hashcode?

They will be stored in same bucket but no next node of linked list. And keys equals () method will be used to identify correct key value pair in HashMap.

zengr
  • 38,346
  • 37
  • 130
  • 192
11

I've heard in my degree classes that a HashTable will place a new entry into the 'next available' bucket if the new Key entry collides with another.

This is actually not true, at least for the Oracle JDK (it is an implementation detail that could vary between different implementations of the API). Instead, each bucket contains a linked list of entries prior to Java 8, and a balanced tree in Java 8 or above.

then how would the HashTable still return the correct Value if this collision occurs when calling for one back with the collision key?

It uses the equals() to find the actually matching entry.

If I implement my own hashing function and use it as part of a look-up table (i.e. a HashMap or Dictionary), what strategies exist for dealing with collisions?

There are various collision handling strategies with different advantages and disadvantages. Wikipedia's entry on hash tables gives a good overview.

Daniel Valland
  • 1,057
  • 4
  • 21
  • 45
Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
  • It's true for both `Hashtable` and `HashMap` in jdk 1.6.0_22 by Sun/Oracle. – Nikita Rybak Feb 12 '11 at 21:42
  • @Nikita: not sure about Hashtable, and I don't have access to the sources right now, but I am 100% certain HashMap uses chaining and not linear probing in every single version I've ever seen in my debugger. – Michael Borgwardt Feb 12 '11 at 21:45
  • @Michael Well, I'm looking at the source of HashMap's `public V get(Object key)` right now (same version as above). If you do find precise version where those linked lists appear, I'd be interested to know. – Nikita Rybak Feb 12 '11 at 21:51
  • @Niki: I'm now looking at the same method, and I see it using a for loop to iterate through a linked list of `Entry` objects: `localEntry = localEntry.next` – Michael Borgwardt Feb 12 '11 at 22:17
  • @Michael Sorry, it's my mistake. I interpreted code in wrong way. naturally, `e = e.next` is not `++index`. +1 – Nikita Rybak Feb 12 '11 at 22:28
  • Ah, while I was writing my answer, you already resolved this discussion :-( – Paŭlo Ebermann Feb 12 '11 at 22:38
  • you actually answered all question rather than accepted one – Eklavyaa Dec 31 '16 at 07:28
7

Update since Java 8: Java 8 uses a self-balanced tree for collision-handling, improving the worst case from O(n) to O(log n) for lookup. The use of a self-balanced tree was introduced in Java 8 as an improvement over chaining (used until java 7), which uses a linked-list, and has a worst case of O(n) for lookup (as it needs to traverse the list)

To answer the second part of your question, insertion is done by mapping a given element to a given index in the underlying array of the hashmap, however, when a collision occurs, all elements must still be preserved (stored in a secondary data-structure, and not just replaced in the underlying array). This is usually done by making each array-component (slot) be a secondary datastructure (aka bucket), and the element is added to the bucket residing on the given array-index (if the key does not already exist in the bucket, in which case it is replaced).

During lookup, the key is hashed to it's corresponding array-index, and search is performed for an element matching the (exact) key in the given bucket. Because the bucket does not need to handle collisions (compares keys directly), this solves the problem of collisions, but does so at the cost of having to perform insertion and lookup on the secondary datastructure. The key point is that in a hashmap, both the key and the value is stored, and so even if the hash collides, keys are compared directly for equality (in the bucket), and thus can be uniquely identified in the bucket.

Collission-handling brings the worst-case performance of insertion and lookup from O(1) in the case of no collission-handling to O(n) for chaining (a linked-list is used as secondary datastructure) and O(log n) for self-balanced tree.

References:

Java 8 has come with the following improvements/changes of HashMap objects in case of high collisions.

  • The alternative String hash function added in Java 7 has been removed.

  • Buckets containing a large number of colliding keys will store their entries in a balanced tree instead of a linked list after certain threshold is reached.

Above changes ensure performance of O(log(n)) in worst case scenarios (https://www.nagarro.com/en/blog/post/24/performance-improvement-for-hashmap-in-java-8)

Community
  • 1
  • 1
Daniel Valland
  • 1,057
  • 4
  • 21
  • 45
  • Can you explain how worst case insertion for a linked-list HashMap is only O(1), and not O(N)? Seems to me that if you have a collision rate of 100% for non-duplicate keys, you end up having to traverse every object in the HashMap to find the end of the linked list, right? What am I missing? – mbm29414 Jul 30 '18 at 23:34
  • In the specific case of hashmap implementation you are actually right, but not because you need to find the end of the list. In a general case linked-list implementation, a pointer is stored to both head and tail, and hence insertion can be done in O(1) by attaching the next node to the tail directly, but in the case of hashmap, the insert method needs to ensure no duplicates, and thus must search the list to check if the element already exists, and hence we end up with O(n). And so it is the set property imposed on a linked-list that is causing O(N). I will make a correction to my answer :) – Daniel Valland Aug 01 '18 at 00:19
4

here's a very simple hash table implementation in java. in only implements put() and get(), but you can easily add whatever you like. it relies on java's hashCode() method that is implemented by all objects. you could easily create your own interface,

interface Hashable {
  int getHash();
}

and force it to be implemented by the keys if you like.

public class Hashtable<K, V> {
    private static class Entry<K,V> {
        private final K key;
        private final V val;

        Entry(K key, V val) {
            this.key = key;
            this.val = val;
        }
    }

    private static int BUCKET_COUNT = 13;

    @SuppressWarnings("unchecked")
    private List<Entry>[] buckets = new List[BUCKET_COUNT];

    public Hashtable() {
        for (int i = 0, l = buckets.length; i < l; i++) {
            buckets[i] = new ArrayList<Entry<K,V>>();
        }
    }

    public V get(K key) {
        int b = key.hashCode() % BUCKET_COUNT;
        List<Entry> entries = buckets[b];
        for (Entry e: entries) {
            if (e.key.equals(key)) {
                return e.val;
            }
        }
        return null;
    }

    public void put(K key, V val) {
        int b = key.hashCode() % BUCKET_COUNT;
        List<Entry> entries = buckets[b];
        entries.add(new Entry<K,V>(key, val));
    }
}
Jeffrey Blattman
  • 22,176
  • 9
  • 79
  • 134
4

It will use the equals method to see if the key is present even and especially if there are more than one element in the same bucket.

Hovercraft Full Of Eels
  • 283,665
  • 25
  • 256
  • 373
4

As there is some confusion about which algorithm Java's HashMap is using (in the Sun/Oracle/OpenJDK implementation), here the relevant source code snippets (from OpenJDK, 1.6.0_20, on Ubuntu):

/**
 * Returns the entry associated with the specified key in the
 * HashMap.  Returns null if the HashMap contains no mapping
 * for the key.
 */
final Entry<K,V> getEntry(Object key) {
    int hash = (key == null) ? 0 : hash(key.hashCode());
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
         e != null;
         e = e.next) {
        Object k;
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            return e;
    }
    return null;
}

This method (cite is from lines 355 to 371) is called when looking up an entry in the table, for example from get(), containsKey() and some others. The for loop here goes through the linked list formed by the entry objects.

Here the code for the entry objects (lines 691-705 + 759):

static class Entry<K,V> implements Map.Entry<K,V> {
    final K key;
    V value;
    Entry<K,V> next;
    final int hash;

    /**
     * Creates new entry.
     */
    Entry(int h, K k, V v, Entry<K,V> n) {
        value = v;
        next = n;
        key = k;
        hash = h;
    }

  // (methods left away, they are straight-forward implementations of Map.Entry)

}

Right after this comes the addEntry() method:

/**
 * Adds a new entry with the specified key, value and hash code to
 * the specified bucket.  It is the responsibility of this
 * method to resize the table if appropriate.
 *
 * Subclass overrides this to alter the behavior of put method.
 */
void addEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
    if (size++ >= threshold)
        resize(2 * table.length);
}

This adds the new Entry on the front of the bucket, with a link to the old first entry (or null, if no such one). Similarily, the removeEntryForKey() method goes through the list and takes care of deleting only one entry, letting the rest of the list intact.

So, here is a linked entry list for each bucket, and I very doubt that this changed from _20 to _22, since it was like this from 1.2 on.

(This code is (c) 1997-2007 Sun Microsystems, and available under GPL, but for copying better use the original file, contained in src.zip in each JDK from Sun/Oracle, and also in OpenJDK.)

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
  • 1
    I marked this as *community wiki*, since it doesn't really is a answer, more a bit of discussion to the other answers. In comments simply isn't enough space for such code citations. – Paŭlo Ebermann Feb 12 '11 at 22:37
2

There are various methods for collision resolution.Some of them are Separate Chaining,Open addressing,Robin Hood hashing,Cuckoo Hashing etc.

Java uses Separate Chaining for resolving collisions in Hash tables.Here is a great link to how it happens: http://javapapers.com/core-java/java-hashtable/