1

Suppose I know the exact number of key-value pairs that will go in the HashMap and I know that it is not a power of 2. In these cases, should I specify the initial capacity or not ? I could get the nearest power of 2 and specify that but still I'd like to know which would be the better thing to do in such cases (when I don't want to calculate the nearest power of 2).

Thanks!

dimo414
  • 47,227
  • 18
  • 148
  • 244
Shubham
  • 780
  • 3
  • 13
  • 32
  • If memory serves, it gets rounded to the nearest power of 2 anyways, so go ahead specify an initial capacity. – awksp Jun 24 '14 at 13:00
  • @user3580294 I looked at the source code and it's not. That only happens if you use the constructor mit another `Map` as parameter. – André Stannek Jun 24 '14 at 13:08
  • 1
    I'm *definitely* seeing it in the source code for the `initialCapacity` constructor, under the `tableSizeFor()` method. Looking at the Oracle 1.8.0_05 source. – awksp Jun 24 '14 at 13:10
  • @user3580294 no such thing in `java.util.HashMap` in Oracle 1.7.0_51. Cross-checking 1.8 for interest now ;-) – André Stannek Jun 24 '14 at 13:16
  • @user3580294 Seeing Mikulskis answer I see that it's done during the first put. But still not in the constructor in 1.7 – André Stannek Jun 24 '14 at 13:19
  • @user3580294 Okay, you are right for 1.8 :-) – André Stannek Jun 24 '14 at 13:21
  • @AndréStannek If the table size is rounded to a power of two in `put()` I suppose it'd make sense for the size to be a power of two when the table is first created. I don't have the 1.7 source handy, but I might check later once I have time. – awksp Jun 24 '14 at 13:24

2 Answers2

4

If you look at the java.util.HashMap source code (java 1.7) (you can find in the src.zip file in the JDK's directory) you will see that the HashMap's put method uses the inflateTable method to create an array that stores HashMap's entries and the method always increases the capacity of the HashMap to a power of two that is greater than (or equal to) the size that you specified.

Here is the method:

    private void inflateTable(int toSize) {
        // Find a power of 2 >= toSize
        int capacity = roundUpToPowerOf2(toSize);

        threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        table = new Entry[capacity];
        initHashSeedAsNeeded(capacity);
    }

Hence it does not matter if the size you specified is a power of two.

Bartosz Mikulski
  • 525
  • 3
  • 17
3

You should consider the initial capacity a hint to the HashMap of approximately what data to expect. By providing a correct initial capacity, you minimize the number of times the map has to be rebuilt in order to scale up. If, for instance, you knew you were planning to insert a million records, by constructing the map with a 1,000,000 initial capacity, it will ensure, at construction time, to allocate enough memory to handle that many inserts. After that point, future inserts into the map may require a large O(n) operation during a map.put() call in order to resize.

Thinking of this initial capacity as a hint, rather than an instruction you expect HashMap to follow, may help you see that the optimization you're describing is unnecessary. HashMap is designed to behave well in all normal circumstances, and so while providing an initial capacity can help marginally, it's generally not going to have huge impacts on your code unless you are building many new large maps all the time. In such a case specifying the capacity would avoid the intermediate table resizing, but that's all.

As documented, you could introduce some unnecessary slowdowns if you specified too large of an initial capacity:

Iteration over collection views requires time proportional to the "capacity" of the HashMap instance

However in practice the wasted memory of allocating such large maps would likely cause problems for you sooner than the slightly slower iteration speed.

Be sure you read Why does HashMap require that the initial capacity be a power of two? as well.


One thing you might consider is switching to Guava's ImmutableMap implementation; if you know in advance the contents of your map, and you don't expect to change them, immutable collections will prove easier to work with and use less memory than their mutable counterparts.


Here's some quick inspections I did using Scala's REPL (and some personal utility functions) to inspect what's going on inside HashMap (Java 1.7):

// Initialize with capacity=7
scala> new HashMap[String,String](7)
res0: java.util.HashMap[String,String] = {}

scala> getPrivate(res0, "table").length
res1: Int = 8

scala> ... put 7 values

// Still internally using the same array
scala> getPrivate(res0, "table").length
res9: Int = 8

// Specifying capacity 9 allocates a 16-lenth array
scala> getPrivate(new HashMap[String,String](9), "table").length
res10: Int = 16

// Copying our first map into a new map interestingly
// also allocates the default 16 slots, rather than 8
scala> getPrivate(new HashMap[String,String](res0), "table").length
res11: Int = 16

scala> ... put 10 more values in our map

scala> getPrivate(res0,"table").length
res22: Int = 32

// Copying again immediately jumps to 32 capacity
scala> getPrivate(new HashMap[String,String](res0),"table").length
res23: Int = 32
Community
  • 1
  • 1
dimo414
  • 47,227
  • 18
  • 148
  • 244
  • I will try to re-run these Scala examples on Java 1.6 if I can get my hands on it. In the meantime, why not upgrade, [the future is here](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) :) – dimo414 Jun 24 '14 at 14:04
  • Thanks for your answer. I would upgrade if it was in my hands :) – Shubham Jun 25 '14 at 13:28