3

I already have a 64 bit hash function in a library (C coding), but I only need 48 bits. I need to trim down the 64 bit hash value to a 48 bit value, yet it has to be in a safe manner in order to minimize collision.

The hash function is a very good 64 bit hash function. It has been tested with SMHasher (the "DieHarder" hash testing) and proved better than Murmur2. According to my colleagues, the algorithm implemented in the lib for 64-bit hashing is xxHash, tested with SMHasher and got a Q.Score of 10! For those who want to see it, the source code for xxHash is available on github.com : github.com/Cyan4973/xxHash/releases/latest.

The basic idea is to have all bits in the 64-bit hash value (or part of them) have an effect on the resulting 48-bit hash value. Is there any way to do that?

[Late EDIT]:
So I have implemented my own 48-bit (quasi)-UUID generator.
Please check a complete working solution (including source code) here: https://stackoverflow.com/a/47895889/4731718.

סטנלי גרונן
  • 2,917
  • 23
  • 46
  • 68
  • 10
    If it's really a *good* 64-bit hash function, then it's essentially random bits, so you can just grab 48 of them in any way you like. – Lee Daniel Crocker Oct 02 '15 at 17:46
  • there is no information kept in a hash code, unless you specifically use some special ones, like local sensitive hashing. in short, you just pick the lowest 48 bit and that's it – Jason Hu Oct 02 '15 at 17:46
  • Even if it's a really good hash function you loose 16bit of collision safety no matter what you do. And if you don't know the inner structure you might even loose more than just a quarter of collision safety than you would expect. – SkryptX Oct 02 '15 at 18:00
  • @LeeDanielCrocker Yes, it's a very good 64 bit hash function. Tested with SMHasher and proved better than Murmur2. – סטנלי גרונן Oct 02 '15 at 19:20
  • 2
    There are `2^64` 64 bit hashes. You can't put them into `2^48` 48 bit hashes without at least `2^16` of them piling up on the same spot. – Teepeemm Oct 02 '15 at 19:34
  • @Teepeemm : You are absolutely correct. Yet I'm not on a quest for perfection. I was just asking for a "decent" solution... That's why I voted UP for Lee Daniel Crocker's answer. – סטנלי גרונן Oct 02 '15 at 19:42
  • 1
    I think you're getting unhelpful responses (and downvotes) because you aren't quite clear in what you're looking for. "'keep' the information from all 64 bits" implies that you want to be able to undo the 64->48 transformation, which implies no collisions. "minimize collisions" implies that you are willing to accept some collisions (but not many). Your upvoting of Lee Daniel Coker's comment (not an answer) implies that you are satisfied with any solution that minimizes the collisions, however many there may be. That's three different possibilities. – Teepeemm Oct 02 '15 at 20:59
  • 1
    He is probably trying to build his own implementation of UUID version 1 generator. It's the only use of a 48 bit hash i know in computers world. RFC 4122 does allow the MAC address in a version 1 (or 2) UUID to be replaced by a random 48-bit node id, either because the node does not have a MAC address, or because it is not desirable to expose it. Being a security concern he probably doesn't want to disclose that he is replacing the UUID MAC adress part (48bit) with a hash algorithm. – Grigore Madalin Feb 20 '17 at 23:22
  • 3
    @Teepeemm: As a result this question is not so weird as it looks. Many programmers all around the globe face this UUID v1 48 bit problem when they try to implement it according to RFC 4122. I will vote this question up. – Grigore Madalin Feb 20 '17 at 23:34
  • @Grigore Madalin: Very interesting comment(s)! Thanks for mentioning the `UUID 48-bit` and `RFC-4122`. **Multumesc :)**... I never thought I will live the day when this question won't be negative anymore... hehe :) – סטנלי גרונן Feb 21 '17 at 07:59
  • 1
    Sincere thanks for the feedback. – greybeard Mar 16 '17 at 18:14

2 Answers2

14

If the 64-bit hash is good, then selecting any 48 bits will also be a good hash. @Lee Daniel. Of course, information is lost and not reversible.

unsigned long long Mask48 = 0xFFFFFFFFFFFFu;
unsigned long long hash48 = hash64 & Mask48;

If 64-bit hash function is weak, then mod by the largest prime just under pow(2,48). Some buckets will be lost. This will not harm a good hash, yet certainly make weak hashes better.

unsigned long long LargestPrime48 = 281474976710597u;  // FFFFFFFFFFC5
unsigned long long hash48 = hash64 % LargestPrime48;
Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
2
hash >>= 16;

But if you feel better arbitrarily preserving the other 16 bits just use XOR.

hash = (hash >> 16) ^ (hash & 0xFFFF);
Louis Ricci
  • 20,804
  • 5
  • 48
  • 62
  • 1
    Thanks, I was thinking about something similar/identical... but still need to see maybe somebody will come with some brilliant idea. Someone with strong math skills maybe :) – סטנלי גרונן Oct 02 '15 at 19:27