7

I was wondering what was the complexity of the replace(Key , Value) for a HashMap is.

My initial thoughts are O(1) since it's O(1) to get the value and I can simply replace the value assigned to the key.

I'm unsure as to if I should take into account collisions that there might be in a large hashmap implemented in java with java.util.

Zabuzard
  • 25,064
  • 8
  • 58
  • 82
a_confused_student
  • 349
  • 1
  • 3
  • 13
  • 6
    It is `O(1)` **amortized**, just like `containsKey` or `put` or `remove`. Not amortized, its probably `O(n)` since any change might trigger a rehashing, potentially touching all entries. But who cares about non-amortized analysis. – Zabuzard Aug 25 '21 at 09:32
  • I want to expand on "But who cares about non-amortized analysis.": non-amortized analysis only matters in real-time use cases, where a maximum time per item has to be strongly enforced. Almost all use cases really care more about throughput and average runtime and then the non-amortized use case does become irrelevant. – Joachim Sauer Aug 25 '21 at 09:43
  • Just in case, if you are wondering [what is amortized time?](https://stackoverflow.com/questions/200384/constant-amortized-time) – Yousaf Aug 25 '21 at 09:47
  • 2
    Actually, `replace` only changes the **value**, which is not subject to hashing. So it actually runs in the same complexity than `get` or `contains`. – Zabuzard Aug 25 '21 at 09:53
  • 1
    @Zabuzard that's exactly what i was thinking but the number of votes on your first comment made me think that i might be wrong :P – Yousaf Aug 25 '21 at 09:57
  • if i were to use a relatively large hashtable are you saying there could be cases where everything would be subject to rehashing? – a_confused_student Aug 25 '21 at 10:00
  • @a_confused_student You might want to read: [Class HashMap](https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html) – Yousaf Aug 25 '21 at 10:04
  • @a_confused_student No, `replace` does not touch the structure of the map at all. Rehashing and stuff can only happen if you modify the keys, so `put` or `remove` for example. – Zabuzard Aug 25 '21 at 10:09

3 Answers3

7

tl:dr

HashMap#replace runs in O(1) amortized;

and under the premise that the map is properly balanced, which Java takes care of during your put and remove calls, also non-amortized.

Non-amortized

The fact whether it also holds for non-amortized analysis hinges on the question regarding the implemented self-balancing mechanism.

Basically, due to replace only changing the value which does not influence hashing and the general structure of the HashMap, replacing a value will not trigger any re-hashing or re-organization of the internal structure.

Hence we only pay for the cost of locating the key, which depends on the bucket size.

The bucket size, if the map is properly self-balanced, can be considered a constant. Leading to O(1) for replace also non-amortized.

However, the implementation triggers self-balancing and re-hashing based on heuristic factors only. A deep analysis of that is a bit more complex.

So the reality is probably somewhere in between due to the heuristics.


Implementation

To be sure, let us take a look at the current implementation (Java 16):

@Override
public V replace(K key, V value) {
    Node<K,V> e;
    if ((e = getNode(key)) != null) {
        V oldValue = e.value;
        e.value = value;
        afterNodeAccess(e);
        return oldValue;
    }
    return null;
}

The method afterNodeAccess is a dummy for subclasses and is empty in HashMap. Everything except getNode runs in O(1) trivially.

getNode

getNode is the canonical implementation of locating an entry in a HashMap, which we know runs in O(1) for a proper self-balancing map, like Javas implementation. Let's take a look at the code:

/**
 * Implements Map.get and related methods.
 *
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n, hash; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & (hash = hash(key))]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

This method basically computes the hash hash = hash(key), then looks up the hash in the table first = tab[(n - 1) & (hash = hash(key))] and starts iterating through the data structure stored in the bucket.

Regarding the data structure for the bucket, we have a little branching going on at if (first instanceof TreeNode).

Bucket

The buckets are either simple implicitly linked lists or red-black-tree.

Linked List

For the linked list, we have a straightforward iteration

do {
     if (e.hash == hash &&
        ((k = e.key) == key || (key != null && key.equals(k))))
        return e;
} while ((e = e.next) != null);

which obviously runs in O(m) with m being the size of the linked list.

Red-Black-Tree

For the red-black-tree, we have

return ((TreeNode<K,V>)first).getTreeNode(hash, key);

Lookup in a red-black-tree is O(log m), with m being the size of the tree.

Bucket size

Javas implementation makes sure to re-balance the buckets by rehashing if it detects that it gets out of hands (you pay for that on each modifying method like put or remove).

So in both cases we can consider the size of the buckets as constant or, due to the heuristics involved with self-balancing, close to a constant.

Conclusion

Having the buckets at constant size, effectively, makes getNode run in O(1), leading to replace running in O(1) as well.

Without any self-balancing mechanism, in worst case it would degrade to O(n) if a linked list is used and O(log n) for a red-black-tree (for the case that all keys yield a hash collision).

Feel free to dig deeper into the code but it gets a bit more complex down there.

Zabuzard
  • 25,064
  • 8
  • 58
  • 82
  • No it is *not* O(1), it is O(1) amortized. A table with a large number of collisions is log(m), with m the number of entries in the target bucket. – Yann TM Aug 25 '21 at 10:09
  • @YannTM You pay the cost for collisions during `put` and `remove`, not during location `containsKey`, `get`, `replace`. See the last paragraphs for *why*. **tldr**: the bucket size can be considered **constant** due to rehashing and self-balancing. – Zabuzard Aug 25 '21 at 10:12
  • This is ridiculous, the lookup in a hash table is simply not O(1). That would only be the case if there never were *any* collisions. It is normal to have a bit of collision in the table, put/remove *will not* guarantee the absence of collisions. – Yann TM Aug 25 '21 at 10:21
  • @YannTM Your conclusion is incorrect, suppose you would increase and rehash the full table each time a single bucket goes beyond size `100`. With that you could consider the bucket size a constant, independent of `n` already, despite having collisions. Self-balancing takes care of such stuff. That said, the current implementation uses heuristics to determine self-balancing factors and hence is somewhere in between both worlds. – Zabuzard Aug 25 '21 at 10:23
  • @YannTM Anyways, I reworded a bunch of stuff in my answer to give your observation more credit. – Zabuzard Aug 25 '21 at 10:31
  • It's not O(1) amortized time, it's O(1) *expected* time. Unlike amortized time, there's no guarantee that performing N operations is O(N). – Matt Timmermans Aug 25 '21 at 12:26
  • @MattTimmermans Hinges on the details of the balancing. If the implementation ensures that the buckets are **always** balanced enough to be considered independent of `n`, then any `replace` call will always run in `O(1)`, no matter what. If thats not the case, I (and the answer I wrote) agree to your statement. – Zabuzard Aug 25 '21 at 13:35
  • 1
    I prefer the new wording, thanks for the edit. – Yann TM Aug 25 '21 at 18:48
3

You are right, the main cost is the lookup, which is amortized O(1).

Replacing the associated value with the new one is O(1) once we have found the correct spot. But the lookup is only amortized O(1).

As shown in the code accompanying the incorrect answer of Zabuzard, Java HashMap uses a classical approach, where if you are lucky (the entry you are looking for is the first in the bucket) you get O(1).

If you are less lucky or you have a poor quality hash function (just suppose the worst case, all elements map to the same hash key), to avoid meeting the dreaded O(n) of iterating a plain linked list in the bucket, Java's implementation uses a TreeMap to provide O(log n) complexity.

So Java's hashmap if used correctly should yield basically O(1) replace, and if used incorrectly will degrade gracefully to O(log n) complexity. The threshold is in the TREEIFY (e.g. value is 8 in modern implementation).

Please have a look at these implementation notes in the source: https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/util/HashMap.java#L143-L231

Yann TM
  • 1,942
  • 13
  • 22
2

The basics:

  • java.util.HashMap will resize itself to match given amount of elements
  • so collisions are quite rare (compared to n)
  • (for collisions,) modern HashMap implementations use Trees (Node and TreeNode) inside buckets

In one replace/contains/put/get operation, bucket collisions,

  • if you have k bucket collisions out of n, that's k,
  • which with the tree search gets reduced to O(log2(k))
  • which in the O notation, with k being a small number, is equivalent to O(1).

Furthermore, worst case, hash collisions:

  • if you have a really had hash generator that always gives the same result
  • so we get hash collisions
  • for hash collisions, the Node implementation functions like a LinkedList
  • you would have (with this LinkedList-like search) O(n/2) = O(n) complexity.
  • but this would have to be made on purpose, because
  • the primary factor distribution and the primary number modulo get really good distributions,as long as you don't have too many identical hashCode()s
  • most IDEs or simple id sequencing (like primary keys in databases) will provide a near perfect distribution
    • with an id-sequenced hash function, you will not have any (hash or bucket) collisions, so you could actually just use array indexing instead of hash functions and collision handling

Also, check out the comments and the code yourself: https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java

  • tableSizeFor(int cap)
  • getNode()

Specifically:

  • setting the table size for the bucket array is getting close to using prime numbers, which is basically 2^n - 1
  • getting the bucket is first = tab[(n - 1) & hash]) with 'first' being the bucket
    • which is NOT, as I said, a modulo operation, but simply a bit-wise AND,
      • which is faster,
      • can use more valid bits
      • and produces comparably distributed results

To illustrate how to research this yourself, I wrote some code showing worst-case (hash collision) behaviour:

import java.util.HashMap;

public class TestHashMapCollisions {

    static class C {
        private final String mName;

        public C(final String pName) {
            mName = pName;
        }

        @Override public int hashCode() {
            return 1;
        }
        @Override public boolean equals(final Object obj) {
            if (this == obj) return true;
            if (obj == null) return false;
            if (getClass() != obj.getClass()) return false;
            final C other = (C) obj;
            if (mName == null) {
                if (other.mName != null) return false;
            } else if (!mName.equals(other.mName)) return false;
            return true;
        }
    }


    public static void main(final String[] args) {
        final HashMap<C, Long> testMap = new HashMap<>();
        for (int i = 0; i < 5; i++) {
            final String name = "name" + i;
            final C c = new C(name);
            final Long value = Long.valueOf(i);
            testMap.put(c, value);
        }

        final C c = new C("name2");
        System.out.println("Result: " + testMap.get(c));
        System.out.println("End.");
    }
}

Procedure:

  • use an IDE
  • link the source code of the JDR/JRE you're using to your IDE
  • set a breakpoint to the line System.out.println("Result: " + testMap.get(c));
  • run in debug
  • debugger halts at the breakpoint
  • now go into the HashMap implementation
  • set a breakpoint to the first line of HashMap.getNode() (Node<K,V>[] tab; Node<K,V> first, e; int n; K k; )
  • resume debug; debugger will halt inside HashMap
  • now you can follow the debugger step-by-step

Hint: (You could immediately set the breakpoint inside HashMap, but this would lead to a little chaos, as HashMap is used quite often when the JVM initializes, so you'll hit a lot of unwanted stops first, before you get to testing your code)

JayC667
  • 2,418
  • 2
  • 17
  • 31