0

I have to create a hash function based on 3 shorts. What is the best way to go about doing this?

Edit I have an object called Point. It is composed of three shorts (x, y, z). In order to use this object within a QSet, I have to fill in the body of the following function

uint qHash(const Point &point) {
    // return something here that is a unique combination of x, y, z so that
    // it is very quick to calculate and has minimal (if any) hash collisions
}
Jon
  • 3,985
  • 7
  • 48
  • 80

2 Answers2

2

That depends a great deal on what you need from the hash function.

Is speed critical?

Is a near-perfect hash distribution critical?

How large must your hash key be? 32-bits? 64-bits? Larger?

Without knowledge of any other specifics, you may want to consider something along these lines:

uint hash = (31 * 31 * 31 * (uint)short1) ^ (31 * 31 * (uint)short2) ^ (31 * short3);

That will be fast and should have a reasonable distribution of bits, even if the input values for the shorts are not well distributed

UPDATE:

Modified code sample to type uint. My variant should work well if input is in the range 0 to 512.

If you're interested in understanding why I multiply each input by a power of 31, see

Why does Java's hashCode() in String use 31 as a multiplier?

Community
  • 1
  • 1
Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • I have added more info to my question. – Jon Mar 22 '12 at 05:53
  • Although 31 is a decent multiplier in cases where there is an open-ended number of values to be acted upon, I would think some larger values would be better if there are known to be exactly three of them. – supercat Mar 15 '13 at 23:31
  • @supercat: That is a valid consideration, though I lack the background in cryptographic theory to prove or disprove the statement. Based on what I have read, a prime number whose cube is on the order of 2^24 (depending on likely input ranges) might be a better choice. Running a given sample of say 1,000,000 valid (x,y,z) against the 31 base vs. a larger, prime base would give reasonable insight into the performance of the two options. Use the same input each run, and calculate the number of hash collisions. Use a `Dict` (key is the hash, value is the # of inputs with that hash) – Eric J. Mar 17 '13 at 16:24
  • @EricJ.: I don't think one needs cryptographic theory. If the numbers that are multiplied together never get big enough to use up all 2^32 possible `Integer` values, but some hash values are obtained by more than one combination of inputs, the hash function is certainly yields sub-optimal distribution, and the distribution could probably be improved without hurting performance. Using 31 as a multiplier, the hash function won't generate values outside roughly a +/- 32,540,000 range even though hash values have about a +/- 2,147,000,000 range. – supercat Mar 18 '13 at 14:59
  • `I don't think one needs cryptographic theory`... you *always* need a background in cryptographic theory :-) At least there are so many subtle things that can go wrong with lay-person assumptions and implementations that I don't feel comfortable not offering the disclaimer. If `n` > 40, `n^3` will not fit in a 16-bit ushort, so `(n^3) * z` will wrap a 32-bit uint. – Eric J. Mar 18 '13 at 16:02
1

If the three shorts are relatively evenly distributed, you can just use something like:

hashVal = (short1 xor short2 xor short3) modulo numBuckets

which will give you a short, reduced to a specific range from 0 to numBuckets - 1.

Whether that's suitable or not depends a great deal on how your input values will be distributed and what you expect from your hashing function.

Based on your question edit stating that the hash must go into an unsigned int, and assuming a 16-bit short and 32-bit unsigned int, there's no way to avoid collisions totally (you'd need 48 bits for that). One possibility is to use:

hashVal = (x leftshift 16) logical-or (y leftshift 8) logical-or (z)

This will combine (with logical-or) your values thus:

xxxxxxxxxxxxxxxx0000000000000000
        yyyyyyyyyyyyyyyy00000000
                zzzzzzzzzzzzzzzz

and at least minimise the possibility of simular x/y/z values affecting each other.

And, further to your comment:

I would expect my input values to be in the range of 0 to 512. How would that affect my decision?

If your input values are restricted to the range 0 through 512 (inclusive), you only need ten bits for each ( which will give you the values 0 through 1023). In that case, three of them will easily fit within a 32-bit unsigned integer, so you could use:

hashVal = (x leftshift 20) logical-or (y leftshift 10) logical-or (z)

This gives a perfect hash, with absolutely no chance of collisions.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • What, no sub-second resolution of post times on SO? :-) First exact tie down to the second I have been in. – Eric J. Mar 22 '12 at 05:50
  • Actually, @Eric, when I hover over the "X mins ago" text, I get yours at 5:49:39 and mine at 5:49:40, so I think you may have pipped me at the post :-) – paxdiablo Mar 22 '12 at 05:52
  • I would expect my input values to be in the range of 0 to 512. How would that affect my decision? – Jon Mar 22 '12 at 05:54
  • @paxdiablo: Funny, they must be calling DateTime.Now separately to render each 'answered X seconds ago' text, because both displayed '17 seconds ago' right after posting. – Eric J. Mar 22 '12 at 05:56
  • @Jon: Pax's solution assumes well-distributed inputs, which is not the case if you expect 0 to 512. My solution multiplies by a power of 31 to better distribute bits before combining with XOR, which mitigates that problem. – Eric J. Mar 22 '12 at 05:58
  • @Jon, if your input values are restricted to 0..512, a perfect hash is possible. See the update. – paxdiablo Mar 22 '12 at 06:06
  • @pax: That's a good point. Even if he's not guaranteed all values will be in 0..512, if most values are, those bit positions can be spread across the `uint` evenly. The remaining bit positions could then be "sprinkled" over the `uint` e.g. with xor. For the usual case that they're 0, it remains a perfect hash. For the occasional case that a few bits are set, they are still well-blended. – Eric J. Mar 22 '12 at 16:46