What is a fast hash function for pairs of ints?

Question

I am trying to figure out a fast datatype to store pairs of ints, where the API is just add, remove and isMember. Considering isMember must be fast, the obvious idea is to use a hash-map. Hash functions are mostly made for unbounded strings, so, my question is: considering what I am trying to hash is just a pair of ints, what is a fast hash-function with good collision properties?

It will really depend on the distribution of your data, although `num1*prime + num2` (so say `num1*31 + num2`) is usually decent. — Bernhard Barker, May 29 '14 at 12:26
Do you know anything about the distribution of the integers? If both integers are independently uniformly distributed, just adding them is good enough. If there's any correlation between the two ints, we need more information. — Sven Marnach, May 29 '14 at 12:26
I don't think there is a relation at all. Not any obvious that I can think of, the data is completely arbitrary afaik. It can be any prime, Dukeling? — MaiaVictor, May 29 '14 at 12:27
http://stackoverflow.com/questions/114085/fast-string-hashing-algorithm-with-low-collision-rates-with-32-bit-integer check out this link for your answer — Venkatesan, May 29 '14 at 12:28
@Venkatesan: This is about string hashing, which the OP explicitly is not interested in. — Sven Marnach, May 29 '14 at 12:30
If the answer is different, I'd like to hear both (as well as 64 bits, etc). The size of each int depends on the maximum size of a specific program. ~~ I've been thinking in those lines. XOR too. I just wondered if any of those is particularly best suited. — MaiaVictor, May 29 '14 at 12:33
I don't really know what makes one prime better than another - the standard Java API, for example, uses 31 pretty consistently. — Bernhard Barker, May 29 '14 at 12:38
@Viclib: If your the two ints are independently uniformly distributed, even just taking one of the two integers and completely ignoring the other one is as good a hash function as it gets (and certainly a fast one). So without any further restrictions, it's not possible to give you a better answer. — Sven Marnach, May 29 '14 at 12:39
That is actually a good answer. I might ask it again, except for the specific problem I am trying to solve? — MaiaVictor, May 29 '14 at 12:39
@SvenMarnach how can it be a good hash function since it mapps all pairs (x,y) to x (ignore y) ? — 4pie0, May 29 '14 at 12:43
@bits_international: Under the ssumption that both of the intergers are independently uniformly ditributed, this will give you a perfectly uniformly distributed hash, so you can't get any better than that, nor any faster. In the real world, your data usually isn't like this, but that's exactly the point I meant to make. — Sven Marnach, May 29 '14 at 12:52
Okay, [I just posted the question again](http://stackoverflow.com/questions/23933808/what-is-a-fast-data-structure-to-implement-a-pair-storage-as-specified-here), now being more specific about the actual need for that structure so you guys can understand better how the data is supposed to be used. I'm going to delete this question in a few minutes. Thanks! — MaiaVictor, May 29 '14 at 12:54
@SvenMarnach this is not correct, if you map (x,y)->x, you will have only N buckets, each bucket will contain N pairs, probability of a collision would be 1/N — 4pie0, May 29 '14 at 22:47
@bits_international: The whole point of a hash function is that it maps the space of all values to a smaller space, typically an int. A good hash function aims for a uniform random distribution across the hash space. Under the assumption that both of your input integers are already randomly distributed across the whole space, simply taking one of them perfectly achieves this goal, as does taking the sum, XORing them or computing `a + p*b`. In any case, the number of buckets is limited by the number of possible int values. — Sven Marnach, May 30 '14 at 10:20
@bits_international: Now you are obviously trolling. :) Let's stop right here. — Sven Marnach, May 30 '14 at 11:35
@SvenMarnach I am far from it. I stay with what I said, and I don't claim to convince you, it is hard to do here, using comments. There are important differences between univariate variable distribution and joint multivariate distribution, and (x,y)->x wouldn't be correct, it will introduce high probability of collision — 4pie0, May 30 '14 at 11:45
@SvenMarnach A small nitpick: if they are signed, overflowing would be a problem; so unsigned addition is needed. — Acorn, Jul 13 '19 at 11:33

score 2 · Answer 1 · edited Jul 13 '19 at 11:46

2

For a pair of ints you can go for Szudzik's Function. It 'elegantly' pairs two natural numbers to a unique single number.

Since you mentioned ints, it can also be negative. In that case, use different hashmaps for positive-positive, positive-negative, negative-positive, negative-negative pairs.

edited Jul 13 '19 at 11:46

double-beep

5,031
17
33
41

answered Jul 13 '19 at 11:29

Arkajyoti Banerjee

466
5
12

Ivaylo Strandjev · Answer 2 · 2018-08-15T04:32:24.007

0

The best you can get is to use a hash function for long long(e.g. in C++ it is built in) and use (p.first * (INT_MAX + 1) + p.second). This will work quite well in c++11 and also most of the common implementations of hash_map have a hash function for long long if this is not available you can use (((long long)p.first * prime1) + (long long)p.second) % prime2 where prime1 and prime2 are prime numbers that fit in integer.

edited Aug 15 '18 at 04:32

answered May 29 '14 at 14:52

Ivaylo Strandjev

69,226
18
123
176

any theory behind the second? – 4pie0 May 29 '14 at 22:49
This approach is very similar to the way hash is computed for strings. In this case the string consists of two characters but from a way bigger alphabet – Ivaylo Strandjev May 30 '14 at 04:26

What is a fast hash function for pairs of ints?

2 Answers2