2

If a hash table holds N distinct items, and is not overloaded, then the hashes for the N items must have have approximately lg(N) bits, otherwise too many items will get the same hash value.

But a hash table lookup is usually said to take O(1) time on average.

It's not possible to generate lg(N) bits in O(1) time, so the standard results for the complexity of hash tables are wrong.

What's wrong with my reasoning?

Paul Hankin
  • 54,811
  • 11
  • 92
  • 118
  • O(1) is just a expected :D – Emadpres Nov 25 '14 at 13:21
  • See http://cs.stackexchange.com/q/1643 (linked what seems to be an identical question on that site: http://cs.stackexchange.com/q/27748) – Dan Getz Nov 25 '14 at 13:44
  • Possible duplicate of [Can hash tables really be O(1)?](http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1) – DavidRR May 14 '17 at 17:44
  • @DavidRR It would be a shame to lose the accepted answer here. The accepted answer of the other question does not seem to me to be sound -- it asserts the impossible -- that one can have a fixed size hash function (reading a fixed number of bits of input) and yet preserve O(1) lookups as the number of entries in the table increases to infinity. The answer of R.Martinho Fernandes is exactly correct in my opinion, and I found it hugely helpful in my understanding of complexity theory. – Paul Hankin May 14 '17 at 18:12
  • @PaulHankin Answers to a closed question are never lost. It is just that new answers cannot be added to a closed question. That said, perhaps it would be fairer to say that your question and the one that I referenced are *related* rather than *duplicates*. – DavidRR May 14 '17 at 18:58

3 Answers3

11

What is wrong with your reasoning is the use of conflicting definitions of "time".

When one says that lookup in a hash table takes O(1) time, one usually means that it takes O(1) comparisons, that is, the number of comparisons required to find an item is bounded above by a constant. Under this idea of "time", the actual time (as in the thing you would measure in seconds) used to compute the hash causes no variation.

Measuring time in comparisons is an approximation that, while it may not reflect reality in the same way that measuring it in seconds would, still provides useful information about the behaviour of the hash table.

This sort of thing is true for most asymptotic complexity descriptions of algorithms: people often use "time" with a very abstract meaning that isn't the informal meaning of "time", but more often than not is some variation of "number of operations" (with the kind of operation often left unstated, expected to be obvious, or clear from context).

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
0

The analysis is based on the assumption that the hash function is fixed and not related to the actual number of elements stored in the table. Rather than saying that the hash function returns a lg N-bit value if there are N elements in the hash table, the analysis is based on a hash function that returns, say, a k-bit value, where k is independent of N. Typical value of k (such as 32 or 64) provide for a hash table far larger than anything you need in practice.

So in once sense, yes, a table holding N elements requires a hash function that returns O(lg n) bits; but in practice, a constant that is far larger than the anticipated maximum value of lg n is used.

chepner
  • 497,756
  • 71
  • 530
  • 681
-1

Hashtable search is O(1). I think you are mixing insertion(which is O(n)) and search.

Daniel Kec
  • 529
  • 2
  • 8