7

From JavaDoc of HashMap :

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put).

If we have a higher value ,why would it increase the lookup cost ?

Geek
  • 26,489
  • 43
  • 149
  • 227
  • @PaulTomblin is load factor = bucket size/ number of keys ? If that is the case then the collisions should reduce because increasing load factor means increasing the number in the numerator provided number of keys remain constant. – Geek Aug 25 '12 at 12:12
  • Check this [http://stackoverflow.com/questions/10901752/what-is-the-significance-of-load-factor-in-hashmap][1] [1]: http://stackoverflow.com/questions/10901752/what-is-the-significance-of-load-factor-in-hashmap – user1613360 Aug 25 '12 at 12:15
  • @user1613360 you should learn to put a link in a comment . [Check this out](http://meta.stackexchange.com/questions/37758/inline-links-in-comments) . BTW I have seen that answer before asking the question . It is copy/paste of the Javadoc . – Geek Aug 25 '12 at 12:20

5 Answers5

6

Hash table's Load Factor is defined as

n/s, the ratio of the number of stored entries n and the size s of the table's array of buckets.

High performance of hash table is maintained when the number of collisions is low. When the load factor is high, the number of hash buckets needed to store the same number of entries remains lower, thus increasing the probability of collisions.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • I thought it is s/n and hence the confusion . See my comment to Paul Tomblin. Thanks for clearing my doubt. – Geek Aug 25 '12 at 12:22
3

Here we should first understand what capacity and load factor means:

capacity : this is number of buckets in any hash table at any given point in time.

load factor : The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased

so more the load factor is more occupied a hash table could get before the capacity is increased.

  • now given the best possible implementation of hashCode() only one value will go in one bucket here lookup cost will be minimum
  • in worst case all values will go in same bucket and lookup cost would be maximum
  • in an average case also, this will surely depend on hashCode() implementation but one more factor that will play here is load factor, as more occupied the collection will be, the more will be chances of collision and thus higher load factor will increase lookup cost in a non ideal scenario.
Vivek Gupta
  • 2,534
  • 3
  • 15
  • 28
2

It has to do with how an HashTable is implemented under the hood, it uses hash codes and since the algorithm to calculate hash code is not perfect, you can have some collisions, increasing the load factor increase the probability to have collisions, and consequently reduce the lookup performance ...

aleroot
  • 71,077
  • 30
  • 176
  • 213
0

default load factor (0.75).

If declare load factor as
1 the restructuring process only happens when number of elements are exceeding the capacity of the HashMap.
Mohammod Hossain
  • 4,134
  • 2
  • 26
  • 37
0

The load factor 0.75 can be interpreted this way using the formula(n/s, the ratio of the number of stored entries n and the size s of the table's array of buckets.):

Suppose you have 75 values that you need to store in hash table and you have 100 empty array blocks to store them in, here the chances of collision is minimized and load factor is 0.75.

Now suppose you have 75 values to store and only 10 empty array blocks(load factor 7.5) this means that you will have collision and employ any of collision resolution techniques which will increase your search time.

Now other way you have 75 entries and 1000 empty array blocks(load factor 0.075) this will lead to a lot empty blocks which is a lot waste of space.

Hence the thumb rule is as the value of load factor grows your search time will increase, and as it goes close to 0 more storage space is wasted.

Hence it is a space time tradeof.

Vaibhav Desai
  • 2,334
  • 2
  • 25
  • 29