How are HashMap et al really O(1)?

Question

I'm studying Java collection performance characteristics, Big O notation and complexity, etc. There's a real-world part I can't wrap my head around, and that's why HashMap and other hash containers are considered O(1), which should mean that finding an entry by key in a 1,000 entry table should take about the same time as a 1,000,000 entry table.

Let's say you have HashMap myHashMap, stored with a key of first name + last name. If you call myHashMap.get("FredFlinstone"), how can it instantly find Fred Flinstone's Person object? How can it not have to iterate through the set of keys stored in the HashMap to find the pointer to the object? If there were 1,000,000 entries in the HashMap, the list of keys would also be 1,000,000 long (assuming no collision), which must take more time to go through than a list of 1.000, even if it were sorted. So how can the get() or containsKey() time not change with n?

Note: I thought my question would be answered in Is a Java hashmap really O(1)? but the answers didn't really address this point. My question is also not about collisions.

By the way, it's not a _list of keys_ as you put it. It's an array where the indices are related to the hashcode of the object. — Sotirios Delimanolis, Sep 30 '13 at 01:50
You forgot the "Hash" part of "HashMap". You get the hashcode of the object, modulo (`%`) it the size of your array of ["HashBuckets"](http://www.catb.org/jargon/html/H/hash-bucket.html), then once you are in the bucket you do your linear search. — Scott Chamberlain, Sep 30 '13 at 02:01
And for those of you who said my question is a duplicate, please read the last line of my post, and then tell me where in that post my question is answered — dj_segfault, Sep 30 '13 at 04:45

score 4 · Answer 1 · answered Sep 30 '13 at 01:49

4

"My question is also not about collisions." - Actually it is indirectly. No collision = O(1) ...

In the worst case (pathological case), there would be one bucket with N items hanging off it, then it would be O(N)

answered Sep 30 '13 at 01:49

Mitch Wheat

295,962
43
465
541

1

Well, that's part of my point. O( ) (as opposed to theta, etc) is supposed to be the worst case. – dj_segfault Sep 30 '13 at 04:23

score 1 · Answer 2 · answered Sep 30 '13 at 01:52

Computing the hash function on a given key takes constant time. Looking up whether there is a value stored to that key is a random access operation - the hashmap is backed with an array. The only problem is being assured that different keys with the SAME value (hash collision) doesn't happen too often. If it happened once in n, that's enough for constant time in the average case.

score 1 · Accepted Answer · answered Sep 30 '13 at 02:01

1

Let's take a look at a very simple example of a hash map and a hash function. To keep things simple, let's say that your hash map has 10 buckets and that it uses integers as keys. For the purposes of this example we shall use the following hash function:

public int hash(int key) {
  return key % 10;
}

Now, when we want to store an entry in the map, we hash the key, get an integer between 0-9 and then put that entry in the corresponding bucket. Then, when we need to lookup a key, we just have to compute it's hash and we know exactly what bucket it is in (or would be in) without having to look in any of the others.

answered Sep 30 '13 at 02:01

Aurand

5,487
1
25
35

That makes sense, but what happens when the key is a string, like in my example? I understand String has a has hashCode, but the possible values are in the millions. In order to be useful, there would have to be thousands of buckets with thousands of values. Once you're in the right bucket, you're still visiting thousands of keys. So what am I still missing? – dj_segfault Sep 30 '13 at 04:36
@dj_segfault Given that the hashCode method returns an int (and will generally only return a positive number), there are in fact about 2 billion (2^31) different hash codes. The map will map that hash code to a specific bucket (by modding by the number of buckets for example). The odds of two keys colliding is a function of the size of your map. As long as your map is big enough relative to the number of objects you expect to put into it collisions will be very rare. – Aurand Sep 30 '13 at 05:05
OK, I think we're getting to the point of my confusion. Thank you for your patience. So if there are about 2B possible has codes, in order to find a key in one hop, you would need 2B buckets, right? If you had l00K buckets with 1K items, once you do the modulo trick to jump to the right bucket you're still going through the 1K items to find the match – dj_segfault Sep 30 '13 at 05:19
1

You're assuming that all the buckets are full. If you actually want to store several billion objects in a single map, then sure, you're going to run into some performance problems. If you want to store n strings and you have a map with n*2 to n*3 buckets, you will almost never have > 1 item per bucket. – Aurand Sep 30 '13 at 05:51
1

Sorry for the delay in accepting your answer. I wanted to talk it over with some people first to make sure I really understood it. – dj_segfault Oct 03 '13 at 17:32

How are HashMap et al really O(1)?

3 Answers3