2

I have two UUIDs. I want to hash them perfectly to produce a single unique value, but with a constraint that f(m,n) and f(n,m) must generate the same hash.

  • UUIDs are 128-bit values
  • the hash function should have no collisions - all possible input pairings must generate unique hash values
  • f(m,n) and f(n,m) must generate the same hash - that is, ordering is not important
  • I'm working in Go, so the resulting value must fit in a 256-bit int
  • the hash does not need to be reversible

Can anyone help?

Dagrada
  • 1,135
  • 12
  • 22

2 Answers2

3

Concatenate them with the smaller one first.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. – MZaragoza Apr 19 '15 at 02:45
  • @MZaragoza: No, it does. This fulfills all the requirements of the question. – user2357112 Apr 19 '15 at 02:47
  • can you explain how to do this. make it a bit more clear – MZaragoza Apr 19 '15 at 02:49
  • @MZaragoza: It's unclear how the OP is working with 128-bit or 256-bit numbers in go, so it's unclear how to provide example code, but suppose you wanted to hash pairs of 2-digit decimal numbers into single 4-digit decimal numbers. Then this solution would hash `(05, 22)` to `0522` or `(53, 40)` to `4053`. – user2357112 Apr 19 '15 at 02:55
  • Note that "without requiring ordering" means that the hash results should be the same if the UUIDs swap positions, not that we can't numerically compare the UUIDs as part of the hash. – user2357112 Apr 19 '15 at 03:01
  • @user2357112 good idea to concatenate with the numerically smaller one to solve the ordering problem. Can you suggest what would be a good hash function to use for the resulting 48 byte input given that it has to be collision-free? – Dagrada Apr 19 '15 at 03:08
  • @cachvico: What resulting 48-byte input? You didn't mention that in your question. – user2357112 Apr 19 '15 at 03:44
  • I specified the input is two 128-bit UUIDs, which means that after the concatenation we're dealing with 128*3 bits = 48 bytes. – Dagrada Apr 19 '15 at 15:57
  • @cachvico: Uh... shouldn't that be 128*2 bits? – user2357112 Apr 19 '15 at 18:17
  • I have two UUIDs. UUIDs are 128-bit values. So, I have 2 * 128 = 256 bits to begin with. You propose concatenating the smaller one to eliminate the ordering problem, so then I have three UUIDs = 128 * 3 = 384 bits. 384 / 8 = 48 bytes input to the hashing function. – Dagrada Apr 20 '15 at 01:25
  • @cachvico: Smaller one concatenated with larger one is 256 bits. I don't know why you think there are 3 UUIDs here. – user2357112 Apr 20 '15 at 01:57
  • Ok gotcha. Sorry I was on a different planet. So you're just saying, order them smallest first. So, what's a good hash function to use? – Dagrada Apr 20 '15 at 02:09
  • 1
    @cachvico: That's it. That's your hash. If you need other properties out of your hash that you didn't list in your question, like avalanching or resistance against bucket collisions, you might have to apply another function on top of that, but without knowing what properties you need, I can't suggest one. Consider looking through Wikipedia's [list of hash functions](http://en.wikipedia.org/wiki/List_of_hash_functions). – user2357112 Apr 20 '15 at 02:30
1

To build on user2357112's brilliant solution and boil down the comment chain, let's consider your requirements one by one (and out of order):

  • No collisions

Technically, that's not a hash function. A hash function is about mapping heterogeneous, arbitrary length data inputs into fixed-width, homogenous outputs. The only way to accomplish that if the input is longer than the output is through some data loss. For most applications, this is tolerable because the hash function is only used as a fast lookup key and the code falls back onto the slower, complete comparison of the data. That's why many guides and languages insist that if you implement one, you must implement the other.

Fortunately, you say:

  • Two UUID inputs m and n
  • UUIDs are 128 bits each
  • Output of f(m,n) must be 256 bits or less

Combined your two inputs are exactly 256 bits, which means you do not have to lose any data. If you needed a smaller output, then you would be out of luck. As it is, you can concatenate the two numbers together and generate a perfect, unique representation.

  • f(m,n) and f(n,m) must generate the same hash

To accomplish this final requirement, make a decision on the concatenation order by some intrinsic value of the two UUIDs. The suggested smaller-first works just great. However...

  • The hash does not need to be reversible

If you specifically need irreversible hashing, that's a different question entirely. You could still use the less-than comparison to ensure order independence when feeding to a cryptographically hash function, but you would be hard pressed to find something that guaranteed no collisions even with fixed-width inputs a 256 bit output width.

Community
  • 1
  • 1
Patrick M
  • 10,547
  • 9
  • 68
  • 101