1

In a hashmap, how is the hashcode lookup a constant O(1)?

We know that internally a hashmap creates an array to hold a hash-code for a given key-value. With the use of hashing function, hashmap generates hash code. We also know that for lookup, the hashmap takes constant time (assuming there is no collision). Whenever we request a hashmap to look for a value for a given key it first calculates the bucket location (i.e index of the array which is mapped with hashcode of the given key). Then it fetches the value. I understand the second part will take constant time. But what about the first part? How is the lookup of the array index for hashcode constant? Especially when a hashmap has millions of values?

My StackOverflow search found multiple question on hashmap, but mostly they answered the second part of my question and not for the first part.

Few of the links I found:

  1. Why hashmap lookup is O(1) i.e. constant time?
  2. Can hash tables really be O(1)

I also found this question posted by a user at javarevisited.blogspot:

Hi Javin, Need a clarification on one of my recent interview question. For search and sort which Collection datastructure to prefer : ArrayList or LinkedList. I mentioned ArrayList would be the choice for retrieval operations as it implements Random Access whereas Linked list would be a better choice for insertion / deletion as it holds pointers for before and after node. My followup question, so do you mean to say retrieval is faster using an Arraylist which holds 1million records ? I said if index is known we can use the contains() and get the value. But clarify me on this 1million scenario in real dynamic case i.e. without knowing index. Would ArrayList be still faster ?

Community
  • 1
  • 1
Sam
  • 859
  • 3
  • 12
  • 23
  • 1
    The whole point of big-O time is to describe how an algorithm scales for large values of `n` on an ideal machine. Can you clarify what your question is? – Peter Lawrey Dec 05 '15 at 20:44
  • "even though its true less value" ??? –  Dec 05 '15 at 20:46
  • Thank you Peter for your response. My main question is how hashcode lookup (the very first part of getting bucket location) is of constant time. – Sam Dec 05 '15 at 20:47
  • @ Yves Daoust, lets say if hash map is holding just 10 values then its ok. I want to understand how lookup for hashcode value is constant for hahmap of very large size. – Sam Dec 05 '15 at 20:50
  • @sam: I am afraid that you have a wrong understanding of how a memory access is performed. –  Dec 05 '15 at 20:52
  • What did you find unsatisfactory about the other questions you mentioned? – Dawood ibn Kareem Dec 05 '15 at 20:54
  • Hashmaps *average out* to constant time. A simple function `f(key) -> bucket location` takes constant time. The first part of the lookup is just figuring out where the value for a particular key is, and when Big-O adds many `O(1)` operations together it is still `O(1)`. It might help if you provided a hashmap algorithm and pointed to the part you think doesn't take constant time. – Nathaniel Ford Dec 05 '15 at 20:54
  • @Yves Daoust Thank you for your responses. I agree that something is not very clear to me. That is why i posted here. And seems you got my problem. Please let me know any reference to clarify doubt for memory access, constant index lookup or whatever you feel that can help me. – Sam Dec 05 '15 at 20:55
  • Main computer memory allows direct access. You provide an address and get the data at that address, wherever it is, in constant time (there are cache issues, but don't let's clutter the discussion with this). –  Dec 05 '15 at 20:56
  • @Nathaniel Ford, lets say hashmap having 20 key-value all are different. So there are 20 hashcode, i.e. 20 indexes for bucket location. Lets say whatever key-value i want to lookup is at last index. So index lookup will take for informal O(20). Now lets say sam hash map size becomes 1million. And again whatever key-value i want to find that's index is last. Obviously index (hashcode) lookup will take more time compare to that hash map which was having size 20. Then how hashmap becomes constant lookup time. hope i am clear what i am trying to ask. – Sam Dec 05 '15 at 21:01
  • @David Wallace, apology if my words were so harsh or negative. I just wanted to clarify my doubt with the help of mentioned links. But somehow i could not get satisfied with the answers given in those link. – Sam Dec 05 '15 at 21:03
  • 1
    No, not at all, I'm just trying to work out what information you still need. To me the answers to those other questions seemed fairly clear. I don't understand why you think a memory lookup takes longer if more of the memory locations are "used". You say "obviously", but it's actually not the case. – Dawood ibn Kareem Dec 05 '15 at 21:07

3 Answers3

1

You seem to have a misunderstanding about datastructures. When you create an array, that array has a space in memory saved. The size of that space is the number of elements in the array multiplied by the size per element.

Therefore, an array holding eight 2-byte numbers would be 16 bytes.

Lets say we want the number at the fourth index: we can look up this number without iteration because we know something about the nature of the data structure: specifically where it starts and the size of each element. In this case we know that if we multiply 2 bytes by 3 (3 = 4 - 1: remember we are zero-indexed), we get 6 and the start of the element we want is 6 bytes past the beginning of our array.

Hashmaps are usually backed by arrays of this nature. The calculation of where the desired array element starts is more complicated, but it can be done without iteration. Therefore it is O(1). The value that is found in the array location is the actual place in memory of the value that is retrieved.

  • The function calculating the location in the backing array (where the memory location of the value is stored), using the key provided, is a constant time operation.
  • Reading the memory location calculated in the first step is a constant time operation.
  • Reading the memory location stored in the location read in the second step is a constant time operation.

Thus, the entire operation happens in constant time.

Nathaniel Ford
  • 20,545
  • 20
  • 91
  • 102
0

Array lookup at a given index is done in constant time. Actually, it is a simple address computation (base + index * stride) followed by indirection.

  • So basically you are trying to say that hascode is nothing but a memory address? – Sam Dec 05 '15 at 20:49
  • Nope, the hash code is an index. From the index you compute the address by a simple linear transform. –  Dec 05 '15 at 20:50
0

When you find hashcode, you can find cell number for constant time too

cellIndex = hash(X) % array.length

So you have const time at all

VDanyliuk
  • 1,049
  • 8
  • 23