Extremely fast hash function with collisions allowed

Question

My key is a 64 bit address and the output is a 1 byte number (0-255). Collisions are allowed but the probability of them occurring should be low. Also, assume that number of elements to be inserted are low, lets say not more than 255, as to minimize the pigeon hole effect.

The addresses are addresses of the functions in the program.

what's the distribution of your addresses? it's going to help with the answers, what addresses are they? — talkol, Jan 31 '13 at 16:38
If your hash has 256 values possible, the number of elements should be far less than 256 if you want to avoid collisions. With 255 elements, collisions will be numerous. — AProgrammer, Jan 31 '13 at 17:00
This isn't well-defined at the moment. What's the distribution? What collision rate is tolerable? — Oliver Charlesworth, Jan 31 '13 at 17:02
Should the hash be secure against predicting the output of a given input (on multiple iterations of the same program)? — Alex Chamberlain, Jan 31 '13 at 17:04
At the very best, the chance of a collision will be 1 in 256. With less than ten elements you'll be odds on for a collision, http://blogs.msdn.com/b/ericlippert/archive/2010/03/22/socks-birthdays-and-hash-collisions.aspx — Jodrell, Jan 31 '13 at 17:23
Actually, your first element will have 0 chance of a collision but it increases rapidly from there. 2 readers of this post almost certainley share a birthday. — Jodrell, Jan 31 '13 at 17:25
@AlexChamberlain I've corrected my rather optimistic assertion but the sentiment stands (27th April.) — Jodrell, Jan 31 '13 at 17:31

score 3 · Answer 1 · answered Jan 31 '13 at 16:38

3

uint64_t addr = ...
uint8_t hash = addr & 0xFF;

I think that meets all of your requirements.

answered Jan 31 '13 at 16:38

Alex Chamberlain

4,147
2
22
49

4

it all depends on the distribution of the addresses, it could easily turn out to be horrible – talkol Jan 31 '13 at 16:39
Trying to hash 255 keys to 256 values will probably be horrible in any case. This seems like the fastest possible solution. – Reinstate Monica -- notmaynard Jan 31 '13 at 16:42
@talkol Agreed; however, +1 anyway, since taking the eight low-order bits is about as fast as it gets and, if addresses are uniformly distributed mod 256 (not an unreasonable assumption), it is also about as good at avoiding collisions as it gets. – Patrick87 Jan 31 '13 at 16:42
3

addresses do not distribute uniformly, I'm pretty sure a function address can't be odd, probably has to be divisible by 4 – talkol Jan 31 '13 at 16:44
1

newer gcc will lay out functions so they're aligned on 16 byte boundaries – nos Jan 31 '13 at 16:54
I only posted this answer because the question was ill-defined. – Alex Chamberlain Jan 31 '13 at 16:58
1

And malloc() blocks are almost always 16 byte aligned on 64 bit platforms. But Alex is right, the requirements are such that a "fancy" hash function is probably overkill -- it doesn't take much work at all to get even distribution over 256 possible values. Just pick some middle bits then: `(((long)addr)>>5) & 0xff` – Andy Ross Jan 31 '13 at 17:07
@AndyRoss We don't actually know he wants to hash malloc blocks. He could be getting a page aligned block, dividing it up into objects and hashing the addresses of said objects... – Alex Chamberlain Jan 31 '13 at 17:16
@AndyRoss - you can't be sure of the alignment on every platform, so you'll end up throwing the max lsb's. this is bad because the lowest "good" bits offer the maximum variation and you don't wanna throw them – talkol Jan 31 '13 at 17:20
Great sense of humor, loved it ! – Rerito Jan 31 '13 at 18:51

talkol · Answer 2 · 2013-01-31T17:17:47.533

3

I would XOR together the 2 LSB (least significant bytes), if this distribues badly, then add a 3rd one, and so forth

The rationale behind this is the following: function addresses do not distribute uniformly. The problem normally lies in the lower (lsb) bits. Functions usually need to begin in addresses divisible by 4/8/16 so the 2-4 lsb are probably meaningless. By XORing with the next byte, you should get rid of most of these problems and it's still pretty fast.

edited Jan 31 '13 at 17:17

answered Jan 31 '13 at 16:42

talkol

12,564
11
54
64

XORin with the higher byte achieves the same effect, is as quick, and is a bit simpler since you're not sure how many bits to shift right anyways.. because the lower "good" bits provide the most variation so you won't want to throw them out if you don't have to – talkol Jan 31 '13 at 17:11

score 1 · Answer 3 · edited May 23 '17 at 12:04

1

Function addresses are, I think, quite likely to be aligned (see this question, for instance). That seems to indicate that you want to skip least significant bits, depending on the alignment.

So, perhaps take the 8 bits starting from bit 3, i.e. skipping the least significant 3 bits (bits 0 through 2):

const uint8_t hash = (address >> 3);

This should be obvious from inspection of your set of addresses. In hex, watch the rightmost digit.

edited May 23 '17 at 12:04

Community

1
1

answered Jan 31 '13 at 16:47

unwind

391,730
64
469
606

If the functions are all in the same dso or all in the main program, the high bits will all be the same too. I suspect just bits 4-18 are significant. – R.. GitHub STOP HELPING ICE Jan 31 '13 at 17:22

score 1 · Answer 4 · answered Jan 31 '13 at 16:48

1

How about:

uint64_t data = 0x12131212121211B12;

uint32_t d1 = (data >> 32) ^ (uint32_t)(data);
uint16_t d2 = (d1 >> 16) ^ (uint16_t)(d1);
uint8_t  d3 = (d2 >> 8) ^ (uint8_t)(d2);

return d3;

It combined all bits of your 8 bytes with 3 shifts and three xor instructions.

answered Jan 31 '13 at 16:48

Mats Petersson

126,704
14
140
227

very nice :) but a bit overkill – talkol Jan 31 '13 at 17:25
Could benefit from avoiding 16-bit and 8-bit arithmetic operations on an x86-64 or x86 processor as you don't need to reduce the width down to 8 bits until the very end. Also the last shift would probably be better done with a greater shift right, such as (8+3) or (8+4) because of function alignment as mentioned by an earlier poster. – Cecil Ward Jan 28 '17 at 23:08

Extremely fast hash function with collisions allowed

4 Answers4