I'm trying to implement the djb2 hash in Python.
Here it is in C:
/* djb2 hash http://www.cse.yorku.ca/~oz/hash.html */
uint64_t djb2(size_t len, char const str[len]) {
uint64_t hash = 5381;
uint8_t c;
for(size_t i = 0; i < len; i++) {
c = str[i];
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
}
return hash;
}
And here's my attempt in Python:
from ctypes import c_uint64, c_byte, cast, POINTER
def djb2(string: str) -> c_uint64:
hash = c_uint64(5381)
raw_bytes = cast(string, POINTER(c_byte * len(string)))[0]
for i in range(0, len(raw_bytes)):
hash = c_uint64((((((hash.value << 5) & 0xffffffffffffffff) + hash.value) & 0xffffffffffffffff) + raw_bytes[i]) & 0xffffffffffffffff) # hash * 33 + c
return hash
However, I'm getting different results between the two, which I suspect is because of different overflow behavior, or otherwise mathematical differences.
The reason for the masking in the python version was to attempt to force an overflow (based on this answer).