0

looking at Wikipedia for Hash tables, it says that inserting and searching is O(1). But my concern is, that my teacher told me that only the lookup is O(1) and that hashing is O(s), where s the length of the string. Shouldn't the inserting and search be O(s) instead. Where it says hashing(s) + lookup(s)= O(hashing(s) + lookup(s))= O(s).

Could anyone explain me what is the correct way of writing the time complexity in big O notation for hash tables, and why? If assuming it is perfect hashing and no collisions occur.

Sigils
  • 2,492
  • 8
  • 24
  • 36
  • 2
    The datastructure makes abstraction of the item it inserts/searches/..., since that is more or less a blackbox. After all, you might want to insert an `int`, that perhaps is hashed in *O(1)*, or an array that you hash with *O(n^2)*. Hashing a string is not *per se* *O(s)*, one could for example use a bad hashing algorithm, that only uses the first character. – Willem Van Onsem Dec 08 '18 at 12:58
  • It's not a dupe, but it's very close to this question of mine: https://stackoverflow.com/questions/27127862/is-a-lookup-in-a-hash-table-o1 . The answer I got (which is that complexity of these operations is measured in comparisons, not wall-time) is the same. – Paul Hankin Dec 08 '18 at 13:08
  • @WillemVanOnsem a bad hash function (for example, that only reads the first character) will not give O(1) lookups. Any O(1) hash function can only generate finitely many outputs, and so is guaranteed give worse than O(1) behavior for lookups. – Paul Hankin Dec 08 '18 at 13:12
  • @PaulHankin: no, not per se if we allow results by reference, for example a we could have an arbitrary sized `Integer` (in Haskell) for example, where as hash, we use `id`, it simply returns the reference of the integer. This is more or less the same as a Turing machine that performs a no-op, and thus leaves the original input on the tape. I agree however that it is a bit non-sensical, sure. – Willem Van Onsem Dec 08 '18 at 13:17
  • @WillemVanOnsem I guess `id`s are bounded in size, and therefore such a hashtable can have a bounded number of items in it (probably less than 2^64). Big-O requires consideration of hashtables of unbounded size. – Paul Hankin Dec 08 '18 at 13:21
  • @PaulHankin: `id` is *not* the memory address, it is the identity function: https://www.haskell.org/hoogle/?hoogle=id so this is *not* a property of the object. – Willem Van Onsem Dec 08 '18 at 13:23
  • @WillemVanOnsem oh yes, I see. But you also need to do the step where you reduce the hash function modulo N. – Paul Hankin Dec 08 '18 at 13:24
  • 1
    @PaulHankin: well the modulo is not really necessary, we need to "project' it to a domain of size *n*, how that is done, is again a variable part. We could for example take the first *log n* bits, which thus makes it *O(log n)*, with *n* some parameter that depends on the hashtable itself. So in some sense, insertion in a hashtable is more *O(log n)*, since calculating a modulo is, usually, seen as part of the insertion. This of course only starts to be a problem when *n* is scaling large. – Willem Van Onsem Dec 08 '18 at 13:30

2 Answers2

2

Hash tables are used for more than just strings. The O(1) complexities for insert and lookup are for hash tables in general and only count the known operations.

Hashing and comparison are counted as O(1), because something must always be done for those, even if you're just storing integers, but we don't know what that is.

If you use a hash table for some data type (like strings) that multiplies the cost of those operations then it will multiply the complexity.

It is actually very important to consider this when measuring the complexity of a concrete algorithm that uses hash tables. Many of the string-based algorithms on this site, for example, are given complexities based on the assumption that the length of input strings is bounded by some constant. Thankfully that is usually the case.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
0

This question is very similar to a question I asked: Is a lookup in a hash table O(1)?

The accepted answer was that for hashtables, "time" is measured in comparisons, and not operations. Here's the full answer, quoted:

What is wrong with your reasoning is the use of conflicting definitions of "time".

When one says that lookup in a hash table takes O(1) time, one usually means that it takes O(1) comparisons, that is, the number of comparisons required to find an item is bounded above by a constant. Under this idea of "time", the actual time (as in the thing you would measure in seconds) used to compute the hash causes no variation.

Measuring time in comparisons is an approximation that, while it may not reflect reality in the same way that measuring it in seconds would, still provides useful information about the behaviour of the hash table.

This sort of thing is true for most asymptotic complexity descriptions of algorithms: people often use "time" with a very abstract meaning that isn't the informal meaning of "time", but more often than not is some variation of "number of operations" (with the kind of operation often left unstated, expected to be obvious, or clear from context).

Paul Hankin
  • 54,811
  • 11
  • 92
  • 118