Impact of load factor on lookup time?

Question

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put).

I am not getting how increasing the load factor say to 1, increase the lookup time

For example:- Initial Capacity is 16 and load factor is 1 then resizing to 32 will happen after size reaches to 16 * 1 = 16. Now if i put any new new entry how lookup time will be more in comparison if load factor would have been .75 (In this case hashmap would have resized at size 2 )

As this answer says What is the significance of load factor in HashMap? that lesser the number of free buckets ,higher the chances of collision.

I am not sure how number of free buckets are related to chance of collision.

Per mine understanding, bucket is decided based on hashcode of key object. If it comes out to be same as for some already key object in bucket then only there will be chances of collision otherwise it will go different bucket (out of availablke bucket ). So how come collision is related to free buckets ? Do you mean to say that even if hashcode is different and hashmap is full, then it will try to fit it in existing bucket ?

Its not duplicate of What is the significance of load factor in HashMap?. I am asking the specific point which is not answered in that link

score 2 · Accepted Answer · answered Sep 10 '17 at 11:34

So how come collision is related to free buckets ? Do you mean to say that even if hashcode is different and hashmap is full, then it will try to fit it in existing bucket ?

The hash code of a key can be any int value from 0 to 2³¹-1. This means that the value of the hashCode() method is usually larger than the number of buckets. Therefore two keys of different hash codes may be mapped to the same bucket.

For example, if the number of buckets is 16, hash codes 1, 17, 33, 49, etc... will all be mapped to bucket #1.

If there are more buckets, a smaller number of unique hash codes can be mapped into the same bucket, so there is less chance of the same bucket holding multiple entries.

For example, if the number of buckets is increased to 32, now hash codes 1 & 33 will still be mapped to bucket #1, but hash codes 17, 49, etc... will be mapped to a different bucket (#17) - hence less chance of collision.

When the load factor < 1, the average number of entries in each bucket is < 1, which gives you a good chance of constant lookup time (unless your key has a terrible hashCode implementation that returns few unique values).

Cratylus · Answer 2 · 2017-09-10T17:57:54.773

A hash table is backed by an array.
The size of the backing array and the size of the data structure hash table are not the same number.
An element can end up to exactly the same slot as another element due to collision and the number of collisions is dependent on the size of the backing array too (besides the hash function) since it is
index_in_array = Math.abs(element.hashCode() % array.length);
Even a perfect hash function won't help if you need use a really small backing table for a very large number of elements.
The larger the backing array the larger the space to place an element and the less the chances of a collision.
The load factor (the ratio of the elements inserted to the size of the array) determines when the backing array should get bigger.
For a loading ratio of 1 it means you have exhausted all slots of the backing array which means that you got more collisions since it is impossible to have a case that you would have 1 element per slot.
If you had that case you should be using a plain array in the first place

Impact of load factor on lookup time?

2 Answers2