How to hash a 96-bit struct/number?

Question

So I can't figure out how to do this in C++. I need to do a modulus operation and integer conversion on data that is 96 bits in length.

Example:

struct Hash96bit
{
   char x[12];
};

int main()
{
   Hash96bit n;
   // set n to something
   int size = 23;
   int result = n % size
}

Edit: I'm trying to have a 96 bit hash because i have 3 floats which when combined create a unique combination. Thought that would be best to use as the hash because you don't really have to process it at all.

Edit: Okay... so at this point I might as well explain the bigger issue. I have a 3D world that I want to subdivide into sectors, that way groups of objects can be placed in sectors that would make frustum culling and physics iterations take less time. So at the begging lets say you are at sector 0,0,0. Sure we store them all in array, cool, but what happens when we get far away from 0,0,0? We don't care about those sectors there anymore. So we use a hashmap since memory isn't an issue and because we will be accessing data with sector values rather than handles. Now a sector is 3 floats, hashing that could easily be done with any number of algorithms. I thought it might be better if I could just say the 3 floats together is the key and go from there, I just needed a way to mod a 96 bit number to fit it in the data segment. Anyway I think i'm just gonna take the bottom bits of each of these floats and use a 64 bit hash unless anyone comes up with something brilliant. Thank you for the advice so far.

You can use any hashing algorithm you want. Siphash, spooky hash, Knuth, Jenkins whatever. — David Schwartz, Aug 07 '14 at 03:23
Well if I use 96 bits then it is garuanteed to be unique with no extra processing. Isn't that best? — user2940623, Aug 07 '14 at 03:29
Where did "guaranteed to be unique" come from? Do you have some special requirements that you haven't shared with us? If so, you're going to get bad advice. — David Schwartz, Aug 07 '14 at 03:31
Wait a minute. Are you trying to hash a 96-bit number? Or are you trying to produce a 96-bit hash?! — David Schwartz, Aug 07 '14 at 03:32
I'll edit the question. But anyway, there are 3 floats and each combination of floats is unique. 4 bytes per float = 96 bit. If I do 96 bit hash, then it must be unique. — user2940623, Aug 07 '14 at 03:58
No hash is guaranteed to be unique. If you generate a 32-bit number from 96-bits, you can expect that there will be approximately 2^64 collisions for each 32-bit hash value. In general, hashes produce a non-unique value over the whole set of possible inputs, because a typical hash value is some fixed number of bits, and the total set of inputs ranges over some much larger number of bits. The objective of a hash function is to make it unlikely that you'll find two different inputs that hash to the same value, but the smaller the hash value, the more likely it is that collisions will be found. — Jonathan Leffler, Aug 07 '14 at 04:04
The point of a hash function is often not to be unique... for example, if you're using it to map keys to buckets in a hash table, then you're probably taking the hash function output and `%`-ing it by the current table size... if that's say a power of two then the mod effectively just takes however many of the least significant bits it needs, and if your hash output for two keys only differed in more significant bits then you have a collision. For such purposes, you want a bit changed anywhere in the input to almost "randomly" affect all the bits in the hash output - some flip, some don't. — Tony Delroy, Aug 07 '14 at 04:06
What is your actual problem? What do you want a hash for? And why do you are if it's unique? (And if you care if it's unique, why are you applying a modulo operation to it, which will ensure it's no longer unique.) — David Schwartz, Aug 07 '14 at 04:09

score 1 · Answer 1 · answered Aug 07 '14 at 03:26

1

You can use jenkins hash.

uint32_t jenkins_one_at_a_time_hash(char *key, size_t len)
{
    uint32_t hash, i;
    for(hash = i = 0; i < len; ++i)
    {
        hash += key[i];
        hash += (hash << 10);
        hash ^= (hash >> 6);
   }
   hash += (hash << 3);
   hash ^= (hash >> 11);
   hash += (hash << 15);
   return hash;
}

answered Aug 07 '14 at 03:26

David Schwartz

179,497
17
214
278

**Caution**: This answer assumes you are trying to hash a 96-bit struct/number, as your question says. If you are in fact trying to produce a 96-bit hash, this answer is incorrect. And if you have security requirements, it may be completely incorrect. – David Schwartz Aug 07 '14 at 03:33
I think i want a 96-bit hash. Really all I want is a way to mod(%) a 96 bit number or struct if your representing that way. – user2940623 Aug 07 '14 at 04:11

Tony Delroy · Accepted Answer · 2014-08-07T04:52:27.287

1

UPDATE: Having just read your second edit to the question, I'd recommend you use David's jenkin's approach (which I upvoted a while back)... just point it at the lowest byte in your struct of three floats.

Regarding "Anyway I think i'm just gonna take the bottom bits of each of these floats" - again, the idea with a hash function used by a hash table is not just to map each bit in the input (less till some subset of them) to a bit in the hash output. You could easily end up with a lot of collisions that way, especially if the number of buckets is not a prime number. For example, if you take 21 bits from each float, and the number of buckets happens to be 1024 currently, then after % 1024 only 10 bits from one of the floats will be used with no regard to the values of the other floats... hash(a,b,c) == hash(d,e,c) for all c (it's actually a little worse than that - values like 5.5, 2.75 etc. will only use a couple bits of the mantissa....).

Since you're insisting on this (though it's very likely not what you need, and a misnomer to boot):

struct Hash96bit
{
   union {
       float f[3];
       char x[12];
       uint32_t u[3];
   };

   Hash96bit(float a, float b, float c)
   {
       f[0] = a;
       f[1] = b;
       f[2] = c;
   }

   // the operator will support your "int result = n % size;" usage...
   operator uint128_t() const
   {
       return u[0] * ((uint128_t)1 << 64) +  // arbitrary ordering
              u[1] + ((uint128_t)1 << 32) +
              u[2];
   }
};

edited Aug 07 '14 at 04:52

answered Aug 07 '14 at 04:20

Tony Delroy

102,968
15
177
252

Sounds fun. Thanks. And actually! ... I should have tried that. Didn't know there was a 128 bit in the c types library. – user2940623 Aug 07 '14 at 04:30
1

@user2940623 it's not a Standard-mandated type... up to your implementation whether it's provided. With GCC for example, you may have to call it `unsigned __int128` or `__uint128`. If not provided, you'll have to fall back on boost [see here](http://stackoverflow.com/questions/18439520/is-there-a-128-bit-integer-in-c) or something like `uint64_t operator%(int n) const` where you hope the post-`%` value can be represented in `uint64_t` (otherwise you truncate or throw). – Tony Delroy Aug 07 '14 at 04:46
I'm gonna try using the upper bits of the float and check the collisions. If that doesn't work ill use some kind of octet hash. – user2940623 Aug 07 '14 at 05:46
@user2940623 *can lead a horse to water...* if you don't want to use David's code, knock yourself out with whatever tickles your fancy ;-). – Tony Delroy Aug 07 '14 at 06:06
1

You were right, the upper bits was a terrible hash function. lol – user2940623 Aug 10 '14 at 17:51

How to hash a 96-bit struct/number?

2 Answers2