Why is the complexity of the hash function not considered in the complexity of HashMap's operations?

Question

For a HashMap<String, String> map each time a key-value pair is inserted into the map a hash is calculated -

java.lang.String#hashCode

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

As it's pretty self-explanatory, the complexity of put operation is basically the complexity of the hash calculation.

So what should be the appropriate way to define hashmap worst-case time complexity for put/get operation?

If you have the same question from hash collision perspective, here you can find the answer: Is a Java hashmap really O(1)?

That code won't compile (where does `hash` come from?). Also, what *exactly* is your programming question / the problem you're having? — Frank Schmitt, Jan 11 '18 at 18:23
@Todd this is not a duplicate - That question is referring to something different. — corsiKa, Jan 11 '18 at 18:24
@FrankSchmitt That code is ripped from String source - `hash` is the cached hash value for String. True store, some strings (like "polygenelubricants", will hash to 0 causing them to be computed every time `hashCode()` is called!! — corsiKa, Jan 11 '18 at 18:25
@ravi This is not the same question. Please have a look again — shakhawat, Jan 11 '18 at 18:30
Hash function's complexity is not considered because `HashMap.put` complexity refers to the complexity regarding the number of buckets in the map, i.e. its capacity. In this sense, it's `O(1)`. Of course that if the hash function takes too long to execute, it will have a negative impact on the overall performance of the whole datastructure. — fps, Jan 11 '18 at 18:37
But I think there are actually answers to this question here: https://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1 — Gonen I, Jan 11 '18 at 18:39

score 8 · Accepted Answer · answered Jan 11 '18 at 18:28

8

When you calculate time complexity as a function of N, you must first decide what N represents. When we talk about the complexity of HashMap operations, N represents the size of the HashMap (i.e. the number of key-value pairs stored in the HashMap).

The time complexity of hashCode() of a given key does not depend on the number of entries in the HashMap. Therefore it takes O(1) time to compute the hashCode() (assuming the length of the String key in your example is not a function of the size of the Map - we can construct a strange HashMap<String,String> where the ith key put in the Map has i characters - in that edge case, hashCode() calculation would take O(N) time, and as a result, all the HashMap operations will require O(N) time instead of O(1)).

Once you compute the hashCode(), it takes O(1) time to find out whether the key is already present in the Map (since the average number of entries in each bucket of the HashMap is bound by a constant).

answered Jan 11 '18 at 18:28

Eran

387,369
54
702
768

Can you clarify *"since the average number of entries in each bucket of the HashMap is bound by a constant"* please ? – bonnal-enzo Jan 25 '22 at 13:46
1

@bonnal-enzo `HashMap` has a load factor, which is usually < 1 (the default value is 0.75). For load factor 0.75, if a `HashMap` has 1000 buckets, it will have at most 750 entries (the number of buckets will be increased when you try to add the 751st entry). Therefore the average number of entries in each bucket <= 750/1000, which is a constant (and a small one). – Eran Jan 25 '22 at 13:54
Very clear thanks, do you know by which factor is typically multiplied the number of buckets when the threshold is reached ? by 2 ? – bonnal-enzo Jan 25 '22 at 14:24
1

@bonnal-enzo yes. The number of buckets is a power of 2, and is multiplied by 2 when the HashMap is resized. – Eran Jan 25 '22 at 14:49

Danikov · Answer 2 · 2018-01-11T18:50:07.960

5

Big O notation is talking about the complexity of the operation. Most operations become more complex (i.e. take more time) when there are more elements involved and the notation describes how that complexity grows relative to the number of elements.

With O(1), you are saying that the operation is independent of the number of elements involved. The hash operation may be fast or slow for its own reasons, but that speed will not vary whether you have a HashMap of 1 element or a googolplex of them.

It should be noted that O(1) is the amortized average and not guaranteed. The worst case is considered to be O(n), assumed to be a hash function that returns the same hash each time, but it's conceivable (as proposed by user889742 in the comments) to have a deliberately bad hashcode function that performs even worse than that.

edited Jan 11 '18 at 18:50

answered Jan 11 '18 at 18:29

Danikov

735
3
10

It is entirely possible that the size of each element is related to the total number of elements. ( For example, serial numbers, needing more digits as the number of elements grow. ) In this case, this hash function would get slower as the number of elements grow. – Gonen I Jan 11 '18 at 18:35
1

You're absolutely right that some hashcode implementations can cause a HashMap not to behave with O(1) complexity on insertion, O(1) is the average that presumes a good hashcode implementation. Worst-case is usually considered O(n) where all hashes are the same. I'm not sure if deliberately bad hashcodes have been considered; one that performs an infinite loop has pretty poor performance. – Danikov Jan 11 '18 at 18:44

score 3 · Answer 3 · answered Jan 11 '18 at 18:22

You have to know what the (1) is for. It's for the number of elements in the array. The cost to insert into a HashMap doesn't change depending on how many elements are in the map (assuming you've amortized the cost of the insert over the entire lifetime of the structure.)

Your correct that computing the hashCode of a String is O(n) where n is the length of the String. But once you have it you have it for ever no matter how many times you use it. So it's cost is considered constant.

Why is the complexity of the hash function not considered in the complexity of HashMap's operations?

3 Answers3