Rehashing in Hashmap

Question

The initial capacity and load factor two parameters that affect the HashMap performance. The default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost.

When an item is added to the HashMap, it is assigned to a buckets based on a value derived of its hashCode and the bucket size of the HashMap. To identify the bucket for any , Hash map use key.hashCode() and perform some operation:

Bucket (index) = HashMap.indexFor(HashMap.hash(key.hashCode()),
                                  entryArray.length)

When the number of entries in the hash map exceeds the product of the load factor and the current capacity, the hash map is rehashed (internal data structures are rebuilt), so that the hash map has approximately twice the number of buckets.

When you rehash and move everything to a new location (bucket, etc.) then the older elements are also rehashed again and stored in the new bucket according to their new hash codes. The old space which was allocated to store the elements is garbage collected.

If two thread at the same time found that now HashMap needs re-sizing and they both try to re-size may cause race condition in HashMap.

On the process of re-sizing of HashMap, the element in bucket which is stored in linked list get reversed in order during their migration to new bucket because java HashMap doesn't append the new element at tail instead it append new element at head to avoid tail traversing. If race condition happens then you will end up with an infinite loop.

I have following questions:

Why is the linked list for each bucket be reversed in order during migration to new bucket?
How race condition can lead to infinite loop?
How can increasing the number of buckets diminish the lookup waiting time?
Elements which are in the same bucket will still be together in same bucket after rehashing?

Your post is a set of questions, which does not conform to Q&A concept: single question and the answers making possible to objectively select the best. — Danubian Sailor, Sep 26 '13 at 06:56
Where did you get all this information? I can't see any evidence for most of it in the JDK 6 source code, latest I have to hand. — user207421, Sep 26 '13 at 07:35
1. A single-linked-list is faster to transfer from one bucket to another if you reverse it. Think of it as a stack, you pop from stack 1, push to stack 2, both `O(1)` operations. While appending to a SLL is `O(n)` — millimoose, Sep 26 '13 at 07:54
3. The lookup speed when using chaining is `O(m)` where `m` is the average bucket size. — millimoose, Sep 26 '13 at 07:56
+1 for your Question. May be you got the right answer by now. But the same confusion I had faced. I found the below link. Have a look: http://mailinator.blogspot.hu/2009/06/beautiful-race-condition.html — Sandy, Jan 02 '14 at 10:02
This happens because HashMap doesn't append the new element at the tail instead it appends the new element at the head. So when rehashing occurs, it reads each element and inserts it in the new bucket at the head and then keeps on adding next elements from the old map at the head of the new map resulting in reversal of linked list. — Amit, Jul 22 '15 at 10:00

score 0 · Answer 1 · answered Sep 26 '13 at 07:27

0

That's why we have ConcurrentHashMap. For the overwhelming majority of cases where one is not sharing one's map across multiple threads without synchronization, plain-old HashMap suffices.

There is no guarantee that two objects that collide with n buckets will still collide with 2n buckets. Just using a counting argument it should be about half as likely that any two objects collide. Fewer collisions mean shorter lists, which means the retrieval time is shorter.

Because we are rehashing and collisions are not consistent across different numbers of buckets, I am skeptical that you are reading the code correctly when you assert that each bucket's list is being reversed as part of the process.

answered Sep 26 '13 at 07:27

Judge Mental

5,209
17
22

To answer your 2nd question may be this link will help to understand. [link] http://tekmarathon.wordpress.com/2012/12/04/hashmap-internal-implementation-analysis-in-java/ – sats Jan 09 '14 at 20:03
Ans for 1 : This happens because HashMap doesn't append the new element at the tail instead it appends the new element at the head. So when rehashing occurs, it reads each element and inserts it in the new bucket at the head and then keeps on adding next elements from the old map at the head of the new map resulting in reversal of linked list. – Amit Jul 22 '15 at 10:00

score 0 · Answer 2 · answered Sep 26 '13 at 07:35

Implementation detail - I don't know - probably for performance reasons.
I don't know if it could result in an infinite loop but since there is no synchronisation in HashMap it is not thread safe anyway do how it break is not so important: it will break one way or another...
You end up with less items per bucket - so searching among items in a given bucket is quicker
No, that's the point of rehashing. Imagine a simple hashing algo index = hash % numberOfBuckets for example.

Rehashing in Hashmap

2 Answers2

Linked