6

I've been doing some reading about hash tables, dictionaries, etc. All literature and videos that I have watched imply to hash-tables as having the space/time trade-off property.

I am struggling to understand why a hash table takes up more space than, say, an array or a list with the same number of total elements (values)? Does it have something to do with actually storing the hashed keys?

As far as I understand and in basic terms, a hash table takes a key identifier (say some string), passes it through some hashing function, which spits out an index to an array or some other data-structure. Apart from the obvious memory usage to store your objects (values) in the array or table, why does a hash table use up more space? I feel like I am missing something obvious...

Community
  • 1
  • 1
rex
  • 3,133
  • 6
  • 35
  • 62

2 Answers2

2

Like you say, it's all about the trade-off between lookup time and space. The larger the number of spaces (buckets) the underlying data structure has, the greater the number of locations the hash function has where it can potentially store each item, and so the chance of a collision (and therefore worse than constant-time performance) is reduced. However, having more buckets obviously means more space is required. The ratio of number of items to number of buckets is known as the load factor, and is explained in more detail in this question: What is the significance of load factor in HashMap?

In the case of a minimal perfect hash function, you can achieve O(1) performance storing n items in n buckets (a load factor of 1).

Community
  • 1
  • 1
mtripp100
  • 231
  • 1
  • 5
  • 1
    Sorry I think I'm being a bit thick, but say in the worst case (for space), each bucket stores 1 item. Isn't that effectively like an array and therefore the same size as an array and NOT more? – rex Mar 19 '14 at 10:34
  • Yes, in the ideal case, each bucket stores exactly 1 item, and so N items map to N buckets. However, in practice, this is almost never the case, because as the number of buckets occupied fills up, any sensible hash table implementation will 'grow' the underlying data structure to include more buckets to minimise the possibility of two items hashing to the same space. If space is more important than time, a hash-table probably isn't what you want. – mtripp100 Mar 20 '14 at 11:18
  • but total number of items in all buckets will still be N, so why would that take up more space than an array of N items? – rex Mar 21 '14 at 08:19
  • 1
    are you implying that we add additional buckets that can remain empty and the overhead associated with these buckets is the additional memory taken up by a hash table? – rex Mar 21 '14 at 08:26
  • @ArmenSafieh-Garabedian yes, that's the primary reason for the extra usage, although you may also have to take into account extra memory being used by the [collision resolution method](http://en.wikipedia.org/wiki/Hash_table#Collision_resolution), such as linked lists. – mtripp100 Mar 21 '14 at 09:42
0

As you mentioned, the underlying structure of hash table is an array, which is the most basic type in the data structure world.

In order to make hash table fast, which support O(1) operations. The underlying array's capacity must be more than enough. It uses the term of load factor to evaluate this point. The load factor is the ratio of numbers of element in hash table to numbers of all the cells in the hash table. It evaluates how empty the hash table is.

To make the hash table run fast, the load factor can't be greater than some threshold value. For example, in the Quadratic Probing collision resolution method, the load factor should not be greater than 0.5. When the load factor approaches 0.5 while inserting new element into hash table, we need to rehashing the table to meet the requirement.

So hash table's high performance in the run time aspect is based on more space usage. This is time and space tradeoff.

Chris Bao
  • 2,418
  • 8
  • 35
  • 62