2

I am implementing a hashtable for educational purposes. The hashtable is implemented with an array and collision is dealt by using linked list. The instructions says that I can insert same items without checking to increase speed of insertion. But when chain length reaches max allowed, the hashtable needs to be resized. But I found resizing is not going to help at all because same items still go to the same bucket even array length is increased. Did I miss something here? Thank you very much.

  • if you have a hashtable, the algorithm calculating the correct bucket index for your input-hash should calculate the index depending on the amounts of buckets (your array length) – Michael Ritter Jan 22 '17 at 23:08
  • You may find it helpful - http://www.algolist.net/Data_structures/Hash_table/Dynamic_resizing – fabfas Jan 22 '17 at 23:14

3 Answers3

2

Let's take an example: three objects with hashcodes 7, 23 and 47.

If the hashtable is of size 8, then by modular arithmetic, all of those objects would go into hash bucket 7.

On the other hand, if the hashtable is of size 16, then the first two would go into hash bucket 7 while the other would go into bucket 15.

Joe C
  • 15,324
  • 8
  • 38
  • 50
  • Better yet, if the size is 15, all 3 go to different buckets –  Jan 22 '17 at 23:11
  • Technically true. But most hashmaps will have bucket lengths as powers of two, as they can leverage the `&` operator and make the map more performant. – Joe C Jan 22 '17 at 23:33
  • Thanks for all your comments. But I thought bucket length is always prime number (?). – Anonymous Anonymous Jan 31 '17 at 22:29
  • You are probably thinking of the calculation of the hash code, which will often involve prime numbers in order to improve the chance of an even distribution. But the number of buckets will be a power of 2. – Joe C Feb 01 '17 at 06:02
2

The instructions says that I can insert same items without checking to increase speed of insertion.

You can't skip checking completely, because you would end up with duplicates on the same chain.

But I found resizing is not going to help at all because same items still go to the same bucket even array length is increased.

This would happen only for hash values below table size. For values above table size the % operator will often place the item in a different bucket, assuming that you avoid the aliasing problem.

In order to avoid aliasing, use table sizes corresponding to prime numbers. See this Q&A for additional information on this.

Community
  • 1
  • 1
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
0

I can tell how jdk handles that. You entries (keys) override hashcode - which is an int (made from 32 bits). When you have 16 buckets (internal array has the length of 16), the operation that is performed internally to find out where the entry will go is:

 hash_code & (array.lenght - 1) // this is the same as modulo operation
                                // if array.lenght is power of two.

that means that when you put an entry into the map, only the last 4 bits of the hash code of your entries are taken into account.

Now when you fill those 16 entries (of you implement a load factor): the internal array is made bigger (let's double it), so now it has 32 entries.

This means that deciding where the entry will go is computed:

   hash_code & (32 - 1) // now there are 5 bits take into consideration

All your entries are now re-hashed (because there is one more bit now), and your entries might end this time in different buckets.

Eugene
  • 117,005
  • 15
  • 201
  • 306