1

There are several topics on this in stackoverflow as well as in web, I could not understand them clearly. I found this answer in stackoverflow by Nicklas and his representation which gave me some meaningful insight over the topic.

Some ASCII art

key.hashCode()
     |
     | 32-bit value
     |                          hash table
     V                        +------------+    +----------------------+
HashMap.hash() ----+          | reference  | -> | key1 | value1 | null |
                   |          |------------|    +----------------------+
                   |          | null       |
                   | offset   |------------|    +---------------------+
                   +--------> | reference  | -> | key1 | value1 | ref |
                              |------------|    +---------------------+
                              |    ....    |                       |
                                                  +----------------+
                                                  V
                                                +----------------------+
                                                | key2 | value2 | null |
                                                +----------------------+

What hashing function does Java use to implement Hashtable class?

Map aMap = new HashMap();
aMap.put(key,value);
  1. Can anyone explain me `what is an offset and its value?. Where is the offset value getting mapped?.
  2. what's that hash-table?. Is it an Array of Objects?, If so does each object have an key, value1 and a reference propery?.

Can anyone re-explain the above diagram step-by-step. I am not able to understand the Hashing mechanism behind HashMap.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Shane
  • 5,517
  • 15
  • 49
  • 79
  • 1
    https://en.wikipedia.org/wiki/Hash_table – Oliver Charlesworth Sep 24 '13 at 22:03
  • For (2), just look at the Java source, the 4th member of `HashMap` is `Entry[] table`. – Bernhard Barker Sep 24 '13 at 22:11
  • `"2. The 32-bit value is then transformed by a second hash function (...) into an offset inside the hash table"` - is that unclear? – Bernhard Barker Sep 24 '13 at 22:12
  • Is Hashtable behaving like an array of objects, where does the offset value go?. I am just confused from when the hash function generates the offset value. – Shane Sep 24 '13 at 22:15
  • In `put`, the `key` value is hashed. The hash value is divided by the size of the hash table, and the *remainder* from that division (the `offset` in the above diagram) is used as the index to index into the actual hash table array. In the simplest case, where the hash table slot is empty, a new small object is created containing the `key` and the `value`. (`ref` is null.) The reference (pointer) to that object is placed in the selected hash table slot. – Hot Licks Sep 24 '13 at 22:43
  • The hash value is hashed again using a different function and the mod of that and the array size is taken, which is the offset, i.e. the position in the array. – Bernhard Barker Sep 24 '13 at 22:55

1 Answers1

7

First, lets I explain idea of hash search:

  1. Lets you have some key for search. For example,key is just text string, like "hello".

  2. Lets we compute some "checksum" for this string - for simple example, just by adding together ASCII codes of symbols. Lets (for example, I did not counted actually) this sum is 1009.

  3. Lets we have array of fixed size, for example, SZ=10. Name this array as "hash_table". Each array element contains BUCKET -- this is set of possible keys.

  4. So, we can compute OFFSET - index in the array, i.e. hash-table, by:

    offset = sum % SZ = 1009 % 10 = 9;

  5. Lets we deposit our word "hello" into busker #9 by:

    hash_table[9].add_to_list("hello");

There, we have just single key "hello", stored in the cell hash_table[9];

  1. Imagine, need insert 2nd key, for example, "world". Lets we compute sum, and it will be 37. Again, compute offset = 37 % 10 = 7. So, by this algorithm, we deposit key "world" into hashtable[7].

  2. Collision. Imagine, you decided add 3rd word-key, "collision" Lets sum for this word will be 3007. When you compute offset, it would be 7. So, word "collision" must be deposited into the bucket, where is already exist word "world". This is OK. You just add element with world "collision" into linklist (head or tail). So:

    hashtable[7] -> "world" -> "collistion" -> NULL; hashtable[9] -> "hello" ->NULL;

  3. When you need search for a key "hello", you again compute checksum, and it again would be 1009. Thereafter, you compute offset = 9, and go directly to bucket #9. And, iterate linklist, and there you will found your word "hello". Analogous situation with "world" or "collision".

When you search for word, omitted in the table, you go to empty bucket, or to bucket, which does not contains your key. So, search unsuccessful.

if you make hash table too wide (for example, size as same as your dictionary), average bucket size would be ~1 element (if hash function has good spreading). So, search for a key wil be quick - just compute hashsum, go to bucket, and iterate 1-2 elements in the bucket.

Second - I will answer to your questions:

  1. Offset - just index in the array. Array is a "hash table". Each array element contains pointer to linked list, contains keys, stored in the same bucket.

  2. hashtable in the bucket scheme - just array, contains pointers to heads of buckets. With another hashing scheme (for example, multiple hashing, etc) - it can contains objects theyself, without linklists.

olegarch
  • 3,670
  • 1
  • 20
  • 19
  • Thanks for this explanation. From my understanding the HashCode calculated is nothing but the index of an array. – Shane Sep 25 '13 at 13:41
  • Yes, you are correct. But, algorithm for compute hash sum isn't trivial. For example, just sum of ASCII-codes is not good, because of independent against letter position changing, i.e. sum("int") == sum("tin"). I designed my own hashing algorithm, see http://olegh.cc.st/hashing.pdf page #94. – olegarch Sep 25 '13 at 15:21
  • Sorry, no.. Maybe, sometime I will rewrite in English.. But I stopped write it ~3 years ago, because I do not see any significant interest to this theme... – olegarch Sep 25 '13 at 19:25
  • I am sorry for troubling you more, http://i.stack.imgur.com/zHvEe.jpg in this image what does hash value map too... Does the hash value contain the index of array or something else... http://stackoverflow.com/questions/11596549/how-does-javas-hashmap-work-internally – Shane Sep 25 '13 at 19:31
  • I do not programming on Java, so do not know details of HashMap implementation in that language. In my book, I described in the 5.1.4. algorithm with secondary hash key, hash2. This key used for quick comparison within iterate bucket. What saved in the value "hash" in the picture - I do not know. Perhaps it is secondary hash, just for purpose, as I wrote above. If this is copy of original hash value - this is not smart idea, since there is few "new information" for secondary comparison. – olegarch Sep 25 '13 at 21:52