23

A HashMap has such a phrase from it's documentation:

If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

Notice how the documentation says rehash, not resize - even if a rehash will only happen when a resize will; that is when the internal size of buckets gets twice as big.

And of course HashMap provides such a constructor where we could define this initial capacity.

Constructs an empty HashMap with the specified initial capacity and the default load factor (0.75).

OK, seems easy enough:

// these are NOT chosen randomly...    
List<String> list = List.of("DFHXR", "YSXFJ", "TUDDY", 
          "AXVUH", "RUTWZ", "DEDUC", "WFCVW", "ZETCU", "GCVUR");

int maxNumberOfEntries = list.size(); // 9
double loadFactor = 0.75;

int capacity = (int) (maxNumberOfEntries / loadFactor + 1); // 13

So capacity is 13 (internally it is 16 - next power of two), this way we guarantee that documentation part about no rehashes. Ok let's test this, but first introduce a method that will go into a HashMap and look at the values:

private static <K, V> void debugResize(Map<K, V> map, K key, V value) throws Throwable {

    Field table = map.getClass().getDeclaredField("table");
    table.setAccessible(true);
    Object[] nodes = ((Object[]) table.get(map));

    // first put
    if (nodes == null) {
        // not incrementing currentResizeCalls because
        // of lazy init; or the first call to resize is NOT actually a "resize"
        map.put(key, value);
        return;
    }

    int previous = nodes.length;
    map.put(key, value);
    int current = ((Object[]) table.get(map)).length;

    if (previous != current) {
        ++HashMapResize.currentResizeCalls;
        System.out.println(nodes.length + "   " + current);
    }
}

And now let's test this:

static int currentResizeCalls = 0;

public static void main(String[] args) throws Throwable {

    List<String> list = List.of("DFHXR", "YSXFJ", "TUDDY",
            "AXVUH", "RUTWZ", "DEDUC", "WFCVW", "ZETCU", "GCVUR");
    int maxNumberOfEntries = list.size(); // 9
    double loadFactor = 0.75;
    int capacity = (int) (maxNumberOfEntries / loadFactor + 1);

    Map<String, String> map = new HashMap<>(capacity);

    list.forEach(x -> {
        try {
            HashMapResize.debugResize(map, x, x);
        } catch (Throwable throwable) {
            throwable.printStackTrace();
        }
    });

    System.out.println(HashMapResize.currentResizeCalls);

}

Well, resize was called and thus entries where rehashed, not what the documentation says.


As said, the keys were not chosen randomly. These were set-up so that they would trigger the static final int TREEIFY_THRESHOLD = 8; property - when a bucket is converted to a tree. Well not really, since we need to hit also MIN_TREEIFY_CAPACITY = 64 for the tree to appear; until than resize happens, or a bucket is doubled in size; thus rehashing of entries happens.

I can only hint to why HashMap documentation is wrong in that sentence, since before java-8, a bucket was not converted to a Tree; thus the property would hold, from java-8 and onwards that is not true anymore. Since I am not sure about this, I'm not adding this as an answer.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • 4
    Interesting find - the [official] documentation is not perfect; it might very well be a divergence between documentation and implementation.. – user2864740 Oct 07 '18 at 20:56
  • @user2864740 you can also cast your vote [here if you want](https://stackoverflow.com/questions/45404580/hashmap-resize-method-implementation-detail). it's an old one, but still no answer – Eugene Oct 07 '18 at 20:58
  • 3
    Uhm... what is the question? – Jorn Vernee Oct 07 '18 at 21:32
  • @JornVernee ups, the question is, is this really a documentation flaw? the answer given by Stuart is "yes" – Eugene Oct 08 '18 at 08:02
  • 2
    Unrelated: you gave the right tip. I learned that my "small" helper project ... didnt recompile. I changed to yet another project, setup to use the java 7 language level, and yes, that compiles to java7. So unfortunately my question didn't make sense to stay around. So I had to delete it. What I really hate for upvoted questions, as I got way too many with 0 or less, from revenge downvoters ... but yours one here: nice! – GhostCat Oct 08 '18 at 09:15

1 Answers1

13

The line from the documentation,

If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

indeed dates from before the tree-bin implementation was added in JDK 8 (JEP 180). You can see this text in the JDK 1.6 HashMap documentation. In fact, this text dates all the way back to JDK 1.2 when the Collections Framework (including HashMap) was introduced. You can find unofficial versions of the JDK 1.2 docs around the web, or you can download a version from the archives if you want to see for yourself.

I believe this documentation was correct up until the tree-bin implementation was added. However, as you've observed, there are now cases where it's incorrect. The policy is not only that resizing can occur if the number of entries divided by the load factor exceeds the capacity (really, table length). As you noted, resizes can also occur if the number of entries in a single bucket exceeds TREEIFY_THRESHOLD (currently 8) but the table length is smaller than MIN_TREEIFY_CAPACITY (currently 64).

You can see this decision in the treeifyBin() method of HashMap.

    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        resize();
    else if ((e = tab[index = (n - 1) & hash]) != null) {

This point in the code is reached when there are more than TREEIFY_THRESHOLD entries in a single bucket. If the table size is at or above MIN_TREEIFY_CAPACITY, this bin is treeified; otherwise, the table is simply resized.

Note that this can leave bins with rather more entries than TREEIFY_THRESHOLD at small table sizes. This isn't terribly difficult to demonstrate. First, some reflective HashMap-dumping code:

// run with --add-opens java.base/java.util=ALL-UNNAMED

static Class<?> classNode;
static Class<?> classTreeNode;
static Field fieldNodeNext;
static Field fieldHashMapTable;

static void init() throws ReflectiveOperationException {
    classNode = Class.forName("java.util.HashMap$Node");
    classTreeNode = Class.forName("java.util.HashMap$TreeNode");
    fieldNodeNext = classNode.getDeclaredField("next");
    fieldNodeNext.setAccessible(true);
    fieldHashMapTable = HashMap.class.getDeclaredField("table");
    fieldHashMapTable.setAccessible(true);
}

static void dumpMap(HashMap<?, ?> map) throws ReflectiveOperationException {
    Object[] table = (Object[])fieldHashMapTable.get(map);
    System.out.printf("map size = %d, table length = %d%n", map.size(), table.length);
    for (int i = 0; i < table.length; i++) {
        Object node = table[i];
        if (node == null)
            continue;
        System.out.printf("table[%d] = %s", i,
            classTreeNode.isInstance(node) ? "TreeNode" : "BasicNode");

        for (; node != null; node = fieldNodeNext.get(node))
            System.out.print(" " + node);
        System.out.println();
    }
}

Now, let's add a bunch of strings that all fall into the same bucket. These strings are chosen such that their hash values, as computed by HashMap, are all 0 mod 64.

public static void main(String[] args) throws ReflectiveOperationException {
    init();
    List<String> list = List.of(
        "LBCDD", "IKBNU", "WZQAG", "MKEAZ", "BBCHF", "KRQHE", "ZZMWH", "FHLVH",
        "ZFLXM", "TXXPE", "NSJDQ", "BXDMJ", "OFBCR", "WVSIG", "HQDXY");

    HashMap<String, String> map = new HashMap<>(1, 10.0f);

    for (String s : list) {
        System.out.println("===> put " + s);
        map.put(s, s);
        dumpMap(map);
    }
}

Starting from an initial table size of 1 and a ridiculous load factor, this puts 8 entries into the lone bucket. Then, each time another entry is added, the table is resized (doubled) but all the entries end up in the same bucket. This eventually results in a table of size 64 with one bucket having a linear chain of nodes ("basic nodes") of length 14, before adding the next entry finally converts this to a tree.

Output of the program is as follows:

===> put LBCDD
map size = 1, table length = 1
table[0] = BasicNode LBCDD=LBCDD
===> put IKBNU
map size = 2, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU
===> put WZQAG
map size = 3, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG
===> put MKEAZ
map size = 4, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ
===> put BBCHF
map size = 5, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF
===> put KRQHE
map size = 6, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE
===> put ZZMWH
map size = 7, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH
===> put FHLVH
map size = 8, table length = 1
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH
===> put ZFLXM
map size = 9, table length = 2
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM
===> put TXXPE
map size = 10, table length = 4
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE
===> put NSJDQ
map size = 11, table length = 8
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE NSJDQ=NSJDQ
===> put BXDMJ
map size = 12, table length = 16
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE NSJDQ=NSJDQ BXDMJ=BXDMJ
===> put OFBCR
map size = 13, table length = 32
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE NSJDQ=NSJDQ BXDMJ=BXDMJ OFBCR=OFBCR
===> put WVSIG
map size = 14, table length = 64
table[0] = BasicNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE NSJDQ=NSJDQ BXDMJ=BXDMJ OFBCR=OFBCR WVSIG=WVSIG
===> put HQDXY
map size = 15, table length = 64
table[0] = TreeNode LBCDD=LBCDD IKBNU=IKBNU WZQAG=WZQAG MKEAZ=MKEAZ BBCHF=BBCHF KRQHE=KRQHE ZZMWH=ZZMWH FHLVH=FHLVH ZFLXM=ZFLXM TXXPE=TXXPE NSJDQ=NSJDQ BXDMJ=BXDMJ OFBCR=OFBCR WVSIG=WVSIG HQDXY=HQDXY
Stuart Marks
  • 127,867
  • 37
  • 205
  • 259
  • 4
    I like the code, I love your answer, *but* shouldn't this sentence now be removed from HashMap documentation? – Eugene Oct 08 '18 at 08:11
  • 4
    @Eugene Yes the docs should probably be adjusted. This seems like a minor thing, i.e., just a small wording change, not full documentation of the tree-bins vs. resize policy, etc. Another bug [JDK-8211831](https://bugs.openjdk.java.net/browse/JDK-8211831) just came in requesting HashMap doc updates, so I'll fix this one at the same time. See my comments there. – Stuart Marks Oct 08 '18 at 21:21