5

I don't understand how hash tables are constant time lookup, if there's a constant number of buckets. Say we have 100 buckets, and 1,000,000 elements. This is clearly O(n) lookup, and that's the point of complexity, to understand how things behave for very large values of n. Thus, a hashtable is never constant lookup, it's always O(n) lookup.

Why do people say it's O(1) lookup on average, and only O(n) for worst case?

CaptainForge
  • 1,365
  • 7
  • 21
  • 46
  • 5
    In short it is because you always increase the number of buckets depending on the amount of the data that is put in it. See this: http://stackoverflow.com/questions/9214353/hash-table-runtime-complexity-insert-search-and-delete – justhalf Oct 14 '16 at 03:40
  • I think part of the confusion is what you're defining *n* as... e.g. the total number of elements in the table, or the length of the key, or ...? – LarsH May 24 '23 at 15:28

4 Answers4

3

In layman terms with some hand waving:

At the one extreme, you can have a hash map that is perfectly distributed with one value per bucket. In this case, your lookup returns the value directly, and cost is 1 operation -- or on the order of one, if you like: O(1).

In the real world, implementation often arrange for that to be the case, by expanding the size of the table, etc. to meet the requirements of the data. When you have more items than buckets, you start increasing complexity.

In the worst case, you have one bucket and n items in the one bucket. In this case, it is basically like searching a list, linearly. And so if the value happens to be the last one, you need to do n comparisons, to find it. Or, on the order of n: O(n).

The latter case is pretty much always /possible/ for a given data set. That's why there has been so much study and effort put into coming up with good hashing algorithms. So, it is theoretically possible to engineer a dataset that will cause collisions. So, there is some way to end up with O(n) performance, unless the implementation tweaks other aspects ; table size, hash implementation, etc., etc.

Samantha
  • 975
  • 6
  • 20
Jameson
  • 6,400
  • 6
  • 32
  • 53
3

The purpose of using a hash is to be able to index into the table directly, just like an array. In the ideal case there's only one item per bucket, and we achieve O(1) easily.

A practical hash table will have more buckets than it has elements, so that the odds of having only one element per bucket are high. If the number of elements inserted into the table gets too great, the table will be resized to increase the number of buckets.

There is always a possibility that every element will have the same hash, or that all active hashes will be assigned to the same bucket; in that case the lookup time is indeed O(n). But a good hash table implementation will be designed to minimize the chance of that occurring.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
1

By saying

Say we have 100 buckets, and 1,000,000 elements.

you are basically depriving the hashmap from its real power of rehashing, and also not considering the initial capacity of hashmap in accordance to need. Hashmap is more efficient in cases where each entry gets its own bucket. Lesser percentage of collision can be achieved by higher capacity of hashmap. Each collision means you need to traverse the corresponding list.

Saurav Sahu
  • 13,038
  • 6
  • 64
  • 79
0

Below points should be considered for Hash table impelmentation.

  1. A hashtable is designed such that it re sizes itself as the number of entries get larger than number of buckets by a certain threshold value. This is how we should design if we wish to implement our own custom Hash table.

  2. A good hash function makes sure that entries are well distributed in the buckets of hashtable. This keeps the list in a bucket short.

Above takes care that access time remains constant.

nits.kk
  • 5,204
  • 4
  • 33
  • 55