Code speaks more than words, so:
final int size = 100;
Map<Integer, String> m = new HashMap<>(size);
for (int i = 0; i < size; i++) m.put(i, String.valueOf(i));
Why is the HashMap internally calling resize()
21
2
times! (Credit to Andreas for identifying that the JVM uses HashMaps internally, 19 of the 21 cals were from other processes)
Two resize()
calls is still not acceptable for my application. I need to optimize this.
If I am a new java developer, my first intuitive guess at what "capacity" means in the HashMap constructor is that it is the capacity for the number of elements that I (the consumer of HashMap) am going to put into the Map. But this is not true.
If I want to optimize my usage of HashMap so that it does not need to resize itself at all, then I need to know the internals of HashMap intimately enough to know exactly how sparse the HashMap bucket array needs to be. This is strange in my opinion. HashMap should implicitly do this for you. It is the whole point of encapsulation in OOP.
Note: I have confirmed that resize() is the bottleneck for my applications use case, so that is why my goal is to reduce the number of calls to resize().
The question:
If I know the exact quantity of entries I am going to put into the map beforehand. What capacity do I chose, to prevent any extra calls resize()
operations? Something like size * 10
? I would also like some background on why HashMap
is designed this way.
Edit: I am getting asked a lot why this optimization is necassary. My application is spending a non-trivial amount of CPU time in hashmap.resize(). The hashmaps my application uses are initialized with a capacity equal to the number of elements that we put into it. Therefore, if we can reduce the resize() calls (by choosing a better initial capacity), then my application performance is improved.