2

I need to construct a perfect hash function that maps a set of integer [1..2^64 - 1] to itself (this function is actually some some complex permutation).

To explain this problem suppose that we have sequence of integer primary keys in database. We need to show construct a number (which we show to users) in a way that close numbers corresponds to primary keys that as far from one another as possible.

So, basically I need a bijective function for large set of integers. E.g.

  • 1 -> X1
  • 2 -> X3
  • 3 -> X3
  • ...
  • 2^64 - 1 -> X2^64 - 1

Any suggestions or references will be appreciated.

1 Answers1

0

To space out any two numbers maximally in a space from 0 to upperlimit (excluding) i would set their distance to roughly one-half of upperlimit.

In python it would look like this (code only works if upperlimit is even, otherwise the last element collides):

def my_hash(n, upperlimit):
    return n * upperlimit / 2 % upperlimit + n / 2

def my_unhash(n, upperlimit):
    return n % (upperlimit / 2) * 2 + n / (upperlimit / 2)

Example result:

upperlimit = 16
for i in range(upperlimit):
    h = my_hash(i, upperlimit)
    u = my_unhash(h, upperlimit)
    print "%02d -> %02d -> %02d" % (i, h, u)

00 -> 00 -> 00
01 -> 08 -> 01
02 -> 01 -> 02
03 -> 09 -> 03
04 -> 02 -> 04
05 -> 10 -> 05
06 -> 03 -> 06
07 -> 11 -> 07
08 -> 04 -> 08
09 -> 12 -> 09
10 -> 05 -> 10
11 -> 13 -> 11
12 -> 06 -> 12
13 -> 14 -> 13
14 -> 07 -> 14
15 -> 15 -> 15

The second column shows the hashed values. You can exclude the 0 if you want, because it maps to itself.

fafl
  • 7,222
  • 3
  • 27
  • 50
  • not sure how important or not is that requirement, but please note that with dividing the space into 2, you make any 2 consecutive numbers far apart, but by skipping every other number, they are close 0 -> 0, 2 -> 1, 4 -> 2... same holds for odd numbers. Better strategy (if we want to reduce proximity of numbers which are 2 apart) is to divide the available space into N buckets and shuffling the numbers between these. Obviously using N buckets would mean that every Nth number would get close... but by adjusting N (say for 64bit space, use N as 1024 or higher power of 2) would give spread. – diginoise May 26 '17 at 16:13
  • Good points, the requirements are not clear here. I maximized the distance of consecutive numbers. – fafl May 26 '17 at 19:32