16

Does anyone know how to do this and what the pseudo code would look like?

As we all know a hash table stores key,value pairs and when a key is a called, the function will return the value associated with that key. What I want to do is understand the underlying structure in creating that mapping function. For example, if we lived in a world where there were no previously defined functions except for arrays, how could we replicate the Hashmaps that we have today?

locoboy
  • 38,002
  • 70
  • 184
  • 260

4 Answers4

27

Actually, some of todays Hashmap implentations are indeed made out of arrays as you propose. Let me sketch how this works:

Hash Function A hash function transforms your keys into an index for the first array (array K). A hash function such as MD5 or a simpler one, usually including a modulo operator, can be used for this.

Buckets A simple array-based Hashmap implementation could use buckets to cope with collissions. Each element ('bucket') in array K contains itself an array (array P) of pairs. When adding or querying for an element, the hash function points you to the correct bucket in K, which contains your desired array P. You then iterate over the elements in P until you find a matching key, or you assign a new element at the end of P.

Mapping keys to buckets using the Hash You should make sure that the number of buckets (i.e. the size of K) is a power of 2, let's say 2^b. To find the correct bucket index for some key, compute Hash(key) but only keep the first b bits. This is your index when cast to an integer.

Rescaling Computing the hash of a key and finding the right bucket is very quick. But once a bucket becomes fuller, you will have to iterate more and more items before you get to the right one. So it is important to have enough buckets to properly distribute the objects, or your Hashmap will become slow.

Because you generally don't know how much objects you will want to store in the Hashmap in advance, it is desirable to dynamically grow or shrink the map. You can keep a count of the number of objects stored, and once it goes over a certain threshold you recreate the entire structure, but this time with a larger or smaller size for array K. In this way some of the buckets in K that were very full will now have their elements divided among several buckets, so that performance will be better.

Alternatives You may also use a two-dimensional array instead of an array-of-arrays, or you may exchange array P for a linked list. Furthermore, instead of keeping a total count of stored objects, you may simply choose to recreate (i.e. rescale) the hashmap once one of the buckets contains more than some configured number of items.

A variation of what you are asking is described as 'array hash table' in the Hash table Wikipedia entry.

Code For code samples, take a look here.

Hope this helps.

Joseph Tanenbaum
  • 2,211
  • 15
  • 30
0

Sample Explanation:

At the below source, basically it does two things:

1. Map Representation

  • Some (X number of List) of lists
  • X being 2 power N number of lists is bad. A (2 power N)-1, or (2 power N)+1, or a prime number is good.

Example:

List myhashmap [hash_table_size];
// an array of (short) lists
// if its long lists, then there are more collisions

NOTE: this is array of arrays, not two arrays (I can't see a possible generic hashmap, in a good way with just 2 arrays)

If you know Algorithms > Graph theory > Adjacency list, this looks exactly same.

2. Hash function

And the hash function converts string (input) to a number (hash value), which is index of an array

  • initialize the hash value to first char (after converted to int)
  • for each further char, left shift 4 bits, then add char (after converted to int)

Example,

int hash = input[0];
for (int i=1; i<input.length(); i++) {
    hash = (hash << 4) + input[i]
}

hash = hash % list.size()
// list.size() here represents 1st dimension of (list of lists)
//      that is 1st dimension size of our map representation from point #1
//      which is hash_table_size

See at the first link:

int HTable::hash (char const * str) const

Source:
http://www.relisoft.com/book/lang/pointer/8hash.html
How does a hash table work?

Update
This is the Best source: http://algs4.cs.princeton.edu/34hash/

Community
  • 1
  • 1
Manohar Reddy Poreddy
  • 25,399
  • 9
  • 157
  • 140
0

Could you be more precise? Does one array contain the keys, the other one the values?

If so, here is an example in Java (but there are few specificities of this language here):

for (int i = 0; i < keysArray.length; i++) {
    map.put(keysArray[i], valuesArray[i]);
}

Of course, you will have to instantiate your map object (if you are using Java, I suggest to use a HashMap<Object, Object> instead of an obsolete HashTable), and also test your arrays in order to avoid null objects and check if they have the same size.

Romain Linsolas
  • 79,475
  • 49
  • 202
  • 273
  • Yes, indeed, I didn't see that. I've edited my answer, but the main part is not really specific to Java. – Romain Linsolas Nov 06 '10 at 16:46
  • 4
    I'm pretty sure he wants to create his own implementation of a hash table using two arrays. – sepp2k Nov 06 '10 at 18:02
  • 1
    yes I'm looking to create my own implementation of a hash table. I don't want to use any previously defined objects. I assume that we will need a hashing function (to generate values for value indicies), two arrays (to store keys and values), and a way to get/resolve collsions. – locoboy Nov 06 '10 at 22:11
-1

You mean like this?

The following is using Ruby's irb as an illustration:

 cities = ["LA", "SF", "NY"]
 => ["LA", "SF", "NY"] 

 items = ["Big Mac", "Hot Fudge Sundae"]
 => ["Big Mac", "Hot Fudge Sundae"] 

 price = {}
 => {} 

 price[[cities[0], items[1]]] = 1.29
 => 1.29 

 price
 => {["LA", "Hot Fudge Sundae"]=>1.29} 

 price[[cities[0], items[0]]] = 2.49
 => 2.49 

 price[[cities[1], items[0]]] = 2.99
 => 2.99 

 price
 => {["LA", "Hot Fudge Sundae"]=>1.29, ["LA", "Big Mac"]=>2.49, ["SF", "Big Mac"]=>2.99} 

 price[["LA", "Big Mac"]]
 => 2.49 
nonopolarity
  • 146,324
  • 131
  • 460
  • 740
  • 2
    thanks, but where exactly are you defining the hashing function? to my knowledge you need a hashing function, two arrays and a way to get rid of collisions. – locoboy Nov 06 '10 at 22:10