I have 64bit machine and I want to use 128 bits murmurhash3 due to its speed (MurmurHash3_x64_128
function in https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp).
But the thing is my inputs to this hash function won't be more than 30 bytes long, in which case the for
loop in that MurmurHash3_x64_128
function will only iterate once, and then the tail part will be done. In such a scheme, it seems like the mixing wont be that great. Am I right? If not, could you please elaborate why? If yes, what would you suggest the reasonable minimum length of input key to 128 bits murmurhash3, so that hashing is good?
The second thing is about the truncation of the output bits. As far as I understood from the answer https://stackoverflow.com/a/11488383/7056851, although it causes more collision rate due to less output range, slicing the output will still give good hash values if the original hash function is "random" enough. My question is then if the 128 bit murmurhash3 is a good candidate for output truncation. The reason why I am asking this is that I want to use the MurmurHash3_x64_128
for its speed performance but I only need 32-bit hash values so I am planing to separate the 128 bits to 32 bits and get 4 32-bits hash values for a given key. But I am doubtful about how good the resulting hash values are.
One last question is about the endianness. If you look at the comment at line 52 in the link to the source code, it says:
Block read - if your platform needs to do endian-swapping or can only handle aligned reads, do the conversion here
Why does whether the platform is little endian or big endian matter? After all, all the bits are multipled with some constants and rotated and XORed, etc. and what we want from a hash function is basically to map the input keys to the output range, with a uniform distribution. How the endianness change the picture? And even if it changes the picture, what if the input is an array of char? The endianness shouldn't matter at least for such keys as array of chars, should it?
As you can see, I am not very good at analyzing hash functions. Any clear explanation is appreciated.