1

I'm using the following hash function

function hash_djb2($str){
    $hash = 5381;
    $length = strlen($str);

    for($i = 0; $i < $length; $i++) {
        $hash = (($hash << 5) + $hash) + ord(strtolower($str[$i])) - 96;        
    }
    return $hash;
}

Am I supposed to return $hash or $hash % $numBuckets where $numBuckets is the number of buckets in the hash table?

The former would return really large numbers and make hash collision impossible while the latter returns only values between 0 and $numBuckets-1 but makes hash collision possible

user784637
  • 15,392
  • 32
  • 93
  • 156

1 Answers1

2

The output of hash_djb2() (should) cover the whole spectrum of an integer value, so if you're implementing a hash chain (i.e. limited number of buckets), you would need to use modulo to shorten the range.

Btw, the risk of hash collisions is only reduced if you decide to only use the function output; making the range smaller obviously increases that risk. You're supposed to manage that risk by allowing multiple items to be stored under the same hash value.

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • Yah I know, I'm using a linked list to allow multiple values to be stored in the hash value. http://www.relisoft.com/book/lang/pointer/8hash.html – user784637 Feb 20 '13 at 07:24
  • @user784637 Okay. Btw, I hope this is just for educational purposes :) – Ja͢ck Feb 20 '13 at 07:25
  • Yah of course! I don't have a CS degree so I'm just trying to learn the building blocks of data structures like hash tables, linked lists, etc... – user784637 Feb 20 '13 at 07:38