2

I need to create a function which takes a single integer as argument in the range 0-N and returns a seemingly random number in the same range.

Each input number should always have exactly one output and it should always be the same.

Such a function would produce something like this:

f(1) = 4
f(2) = 1
f(3) = 5
f(4) = 2
f(5) = 3

I believe this could be accomplished by some kind of a hashing algorithm? I don't need anything complex, just not something too simple like f(1) = 2, f(2) = 3 etc.

The biggest issue is that I need this to be reversible. E.g. the above table should be true left-to-right as well as right-to-left, using a different function for the right-to-left conversion is fine.

I know the easiest way is to create an array, shuffle it and just store the relations in a db or something, but as I need N to be quite large I'd like to avoid this if possible.

Edit: For my particular case N is a specific number, it's exactly 16777216 (64^4).

Tony Bogdanov
  • 7,436
  • 10
  • 49
  • 80
  • 2
    If it needs to be reversible, then it isn't a hash; why not simply a bitwise xor with a fixed value? – Mark Baker May 05 '16 at 18:41
  • I do this regularly. I have a set of people identified with SSN. When running reports for researchers, I scramble up the digits. I can unscramble them if I need to go back and add more data. I can post a description of such a function as an answer if you like. – kainaw May 05 '16 at 18:41
  • @MarkBaker Could you elaborate? I've tried a simple $seed ^ $value for all numbers from 1 to 100 with 34 as the seed and it produced numbers outside of the range. – Tony Bogdanov May 05 '16 at 19:15
  • You probably want to use a seed a lot closer to your max value – Mark Baker May 05 '16 at 20:49
  • Power-of-two ranges, like the example given are fairly straightforward, but arbitrary ranges are kind of a headache. Can you guarantee that it's always a power of two range? – sh1 May 06 '16 at 02:56
  • Yes, `N` will always be exactly 64^4 as the number will be represented in base 64 with max 4 characters. – Tony Bogdanov May 06 '16 at 06:23

3 Answers3

3

If the range is always a power of two -- like [0,16777216) -- then you can use exclusive-or just as @MarkBaker suggested. It just doesn't work so easily if your range is not a power of two.

You can use addition and subtraction modulo N, although these alone are too obvious, so you have to combine it with something else.

You can also do multiplication modulo-N, but reversing that is complicated. To make it simpler, we can isolate the bottom eight bits and multiply those and add them in a way that doesn't interfere with those bits so we can use them again to reverse the operation.

I don't know PHP so I'm going to give an example in C, instead. Maybe it's the same.

int enc(int x) {
  x = x + 4799 * 256 * (x % 256);
  x = x + 8896843;
  x = x ^ 4777277;
  return (x + 1073741824) % 16777216;
}

And to decode, play the operations back in reverse order:

int dec(int x) {
  x = x + 1073741824;
  x = x ^ 4777277;
  x = x - 8896843;
  x = x - 4799 * 256 * (x % 256);
  return x % 16777216;
}

That 1073741824 must be a multiple of N, and 256 must be a factor of N, and if N is not a power of two then you can't (necessarily) use exclusive-or (^ is exclusive-or in C and I assume in PHP too). The other numbers you can fiddle with, and add and remove stages, at your leisure.

The addition of 1073741824 in both functions is to ensure that x stays positive; this is so that the modulo operation doesn't ever give a negative result, even after we've subtracted values from x which might have made it go negative in the interim.

sh1
  • 4,324
  • 17
  • 30
  • This looks like it might just work. I'll test it and report back later. – Tony Bogdanov May 06 '16 at 06:23
  • I run your solution for all the numbers in the range against 3 test: 1. The encoding then decoding must return the original input, 2. The encoding should never return duplicates, 3. The result should be in the same range. It passed all 3. This will be the solution I'll use, thanks! – Tony Bogdanov May 06 '16 at 09:42
  • Just be aware that it's only a light obfuscation, and it has some weaknesses as a hash function. A bit more work is required to make it a good hash. – sh1 May 06 '16 at 16:28
1

I offered to describe how I "randomly" scramble up 9-digit SSNs when producing research data sets. This does not replace or hash an SSN. It re-orders the digits. It is difficult to put the digits back in the correct order if you don't know the order in which they were scrambled. I have a gut feeling that this is not what the questioner really wants. So, I am happy to delete this answer if it is deemed off-topic.

I know that I have 9 digits. So, I start with an array that has 9 index values in order:

$a = array(0,1,2,3,4,5,6,7,8);

Now, I need to turn a key that I can remember into a way to shuffle the array. The shuffling has to be the same order for the same key every time. I use a couple tricks. I use crc32 to turn a word into a number. I use srand/rand to get a predictable order of random values. Note: mt_rand no longer produces the same sequence of random digits with the same seed, so I have to use rand.

srand(crc32("My secret key"));
usort($a, function($a, $b) { return rand(-1,1); });

The array $a still has the digits 0 through 8, but they are shuffled. If I use the same keyword I will get the same shuffled order every time. That lets me repeat this every month and get the same result. Then, with a shuffled array, I can pick the digits off the SSN. First, I ensure it has 9 characters (some SSNs are sent as integers and a leading 0 is omitted). Then, I build a masked SSN by picking the digits using $a.

$ssn = str_pad($ssn, 9, '0', STR_PAD_LEFT);
$masked_ssn = '';
foreach($a as $i) $masked_ssn.= $ssn{$i};

$masked_ssn will now have all the digits in $ssn, but in a different order. Technically, there are keywords that make $a become the original ordered array after shuffling, but that is very very rare.

Hopefully this makes sense. If so, you can do it all much faster. If you turn the original string into an array of characters, you can shuffle the array of characters. You just need to reseed rand every time.

$ssn = "111223333"; // Assume I'm using a proper 9-digit SSN
$a = str_split($ssn);
srand(crc32("My secret key"));
usort($a, function($a, $b) { return rand(-1,1); });
$masked_ssn = implode('', $a);

This is not really faster in a runtime way because rand is a rather expensive function and you run rand a hell of lot more here. If you are masking thousands of values as I do, you will want to use an index array that is shuffled just once, not a shuffling for every value.

Now, how do I undo it? Assume I'm using the first method with the index array. It will be something like $a = {5, 3, 6, 1, 0, 2, 7, 8, 4}. Those are the indexes for the original SSN in the masked order. So, I can easily build the original SSN.

$ssn = '000000000'; // I like to define all 9 characters before I start
foreach($a as $i=>$j) $ssn[$j] = $masked_ssn{$i};

As you can see, $i counts from 0 to 8 across the masked SSN. $j counts 5, 3, 6... and puts each value from the masked SSN in the correct place in the original SSN.

kainaw
  • 4,256
  • 1
  • 18
  • 38
  • This is actually an interesting approach. However, there is one issue. You are using a fixed length, 10 digits, which gives you a total of 10000000000 possible numbers. But I do have a limitation and that is the resulting number must be in the range 0-64^4 (16777216) and your approach would surely generate numbers outside of it. P.S. I should have mentioned that `N` is actually a specific number. – Tony Bogdanov May 05 '16 at 19:20
0

Looks like you've got good answer, but still there is an alternative. Linear Congruential Generator (LCG) could provide 1-to-1 mapping and it is known to be a reversible using Euclid's algorithm. For 24bit

Xi = [(A * Xi-1) + C] Mod M
where M = 2^24 = 16,777,216

A = 16,598,013
C = 12,820,163

For LCG reversability take a look at Reversible pseudo-random sequence generator

Community
  • 1
  • 1
Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64