12

In HashMap why threshold value (The next size value at which to resize) is capacity * load factor. Why not as equal to size or capacity of map.

For example initially default capacity = 16 , load factor = 0.75 and hence threshold = (capacity * load factor) = (16 * 0.75) = 12.

Map resize when we add 13th element why is it so, why author of map decided to keep it capacity * load factor (which is 12) ? Why not same as capacity (which is 16).

why not keep threshold value equal to capacity so that rehashing only takes place when hashmap gets full ?

atish shimpi
  • 4,873
  • 2
  • 32
  • 50
niiraj874u
  • 2,180
  • 1
  • 12
  • 19

4 Answers4

9

Javadoc, Javadoc, Javadoc. That is the first place you look. On the HashMap it says:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

As on the theory of hash maps - if your map is full, then you're doing something very, very wrong. By that time you're likely at O(sqrt(N)) on lookups with random data - BAD. You never want your hashmap to be full. But a very sparse map will waste too much space (as you've noted), and will take too long to iterate through. Hence there should be a load factor, that is less than 1 for most use cases.

Note: The "wasted space" is proportional to the size of the map, and inversely proportional to the load factor. However lookup times have a more complex expected performance function. This means that the same load factor will not work for different size hash maps - as it will mean different scale tradeoffs.


A general overview of the tradeoffs can be found in Knuth "The Art of Computer Programming" vol 3.

Ordous
  • 3,844
  • 15
  • 25
  • @Ordus Thanks for the answer but I know the significance of load factor very well but my question is what is the problem when we keep threshold value = size ?. If you have any doubt in my question, Please tell me – niiraj874u Dec 09 '14 at 17:10
  • @niiraj874u I've added some stuff as you were typing the comment. If a hash map is ever full it's performance may degrade very quickly, depending on how the buckets are organized. – Ordous Dec 09 '14 at 17:12
  • @Ordus So I am telling my understanding and than tell me whether I have understood you correctly or not. When you keep threshold value = size, It will not show performance degradation (surely it may have but it would be very less that we can ignore) with small size of map but it will have performance impact when size is in thousands or in lacs.. – niiraj874u Dec 09 '14 at 17:18
  • @niiraj874u Kind of, yes. It's explored in some depth by Knuth in "The Art of Computer Programming" vol 3. Exploring in more depth requires quite a bit of knowledge of maths and probability. This is, of course, only for Javas linked list as bucket implementation. Scatter or open addressing implementations will simply stop working if the hashmap is full. – Ordous Dec 09 '14 at 17:30
  • 1
    @niiraj874u: If you “know the significance of load factor very well” your question doesn’t make much sense. After all, it’s the *only* purpose of the `load factor` to determine the `threshold` as `threshold = capacity * load factor`. Having a `threshold==capacity` is equivalent to having a `load factor` of one. – Holger Dec 09 '14 at 18:15
  • By the way, starting with Java 8 the impact of collisions is much smaller as it switches to a binary tree maintaining the colliding values once a certain number of collisions has been reached for a particular map location. – Holger Dec 09 '14 at 18:22
4

From a theory perspective, the likelihood of maintaining no collisions with a full hash table is very low, so hash tables will be resized to maintain their desired O(1) lookup property - less collisions means more direct access to entries and less searching.

C.B.
  • 8,096
  • 5
  • 20
  • 34
  • Thanks for you answer, How can we ensure that collisions will be low with low threshold value? When I add 500 elements in HashMap, threshold size become 768 and size goes to 1024. So, even if I have 500 elements size is 1024 and it will 2048 when I add 769th element. so here we can see the wastage of space. – niiraj874u Dec 09 '14 at 17:04
2

The HashMap implementation allows you to set the load factor. This design decision gives the user of the class some measure of control over the conditions under which the underlying data structure is resized.

The default load factor value of 0.75 was likely chosen as a reasonable balance between memory usage and map performance (determined by collision rate and resize overhead).

For any given instance of HashMap, you get to choose the appropriate load factor for your particular situation. You need to consider the relative importance of a small memory footprint, how performance sensitive you are for lookups, and how performance sensitive you are for put's (put that causes the map to be rebuilt can be very slow).

As an aside, your concept of a "full" HashMap is a little skewed. The implementation handles an arbitrary number of collisions just fine (although there is a performance cost to collisions). You could use a HashMap with a load factor of 1 billion and it would (probably) never grow beyond a capacity of 16.

There is no problem with setting load factor to 1.0, which would result in a rehash operation when you add the 17th element to a default-sized HashMap. Compared to the default of 0.75, you will use a little less space, do fewer rehashes, and have a few more collisions (and thus searching using equals() in a linked list).

Rob
  • 6,247
  • 2
  • 25
  • 33
1

In java 8 when threshold is reached, than content of bucket is swithcing from using links between object to balanced tree which improves performance from O(n) to O(log n). This is one of features in java 8 sometimes need to remember

Shell Scott
  • 1,679
  • 18
  • 28