Generating a deterministic int from another with no duplicates

Question

I'm looking to create a deterministic number generation function where the input number will always generate the same number, but no two numbers will end up generating the same result.

E.g.:

1 -> 3
2 -> 5
3 -> 4
4 -> 2
5 -> 1

However I need this to work for all numbers that can be represented by a specific datatype, e.g. an int64.

This feels like something that should either be really straightforward, or completely impossible. Is there some random number generation scheme that guarantees this sort of distribution without me having to create an array of all possible numbers, randomly sort, and then use the index (and making me run out of memory in the meantime)?

Many thanks F

This is called a permutation and you'll find enough stuff online on how to generated a permutation, e.g. by slapping together a few cycles. — Volker, May 10 '22 at 15:12
Technically speaking you didn't specify which properties of permutation do you require. These trivial ones `n -> n` or `n -> (n+1) mod 2^64` also satisfy your question :-D — vvv444, May 06 '23 at 09:11

colm.anseo · Accepted Answer · 2022-05-11T17:48:19.633

6

The transformation formula you need is:

f(P) = (mP + s) mod n

// n = range - so for uint64 2^64
// s < range i.e. < 2^64
// m = must be coprime with n

This is modular arithmetic as used in the Affine cipher.

The mod ensures it is within desired range, s is a simple shift and m should be coprime with n. Coprime simply means n and m should not share any common factors. Since n is 2^64 its only factors are the number 2 - so m should basically not be even (i.e. not divisible by 2):

So for uint64 range:

var (
    m = uint64(39293)    // some non-even number
    s = uint64(75321908) // some random number below 2^64
)

func transform(p uint64) uint64 {
    return p*m + s // implicitly mod'ed 2^64 by the type's size
}

This may appear magic, but you can convince yourself it works with uint16:

https://go.dev/play/p/EKB6SH3-SGu

(as uint64 would take quite some resources to run :-)

Update:

For signed numbers (i.e. int64) the logic is no different. Since we know we have a unique one-to-one mapping with uint64 one approach would simply be to convert inputs & outputs from uint64 to int64 and vice versa:

// original unsigned version
func transform(p uint64) uint64 {
    return m*p + s
}

func signedTransform(p int64) int64 {
    return int64(transform(uint64(p)))
}

again here's a int16 example proving there's no collisions:

https://go.dev/play/p/Fkw5FLMK0Fu

edited May 11 '22 at 17:48

answered May 10 '22 at 14:55

colm.anseo

19,337
4
43
52

Interesting. I need to spend a little bit of time convincing myself that this avoids duplicates and gives a uniform distribution... Is this transform called something? Can I read up about it somewhere? – FlamFace May 10 '22 at 14:58
update with math references & anecdotal test code – colm.anseo May 10 '22 at 15:27
1

It is known also as Linear Congruential Generator, Hull–Dobell Theorem etc. Added as an answer below – Severin Pappadeux May 10 '22 at 17:45
"Since uint64 would take quite some resources to run". If you have access to exabyte storage, running it would be easy using jump-ahead method, cores are cheap. – Severin Pappadeux May 10 '22 at 17:57
1

@SeverinPappadeux I was referring to the Go playground which has an upper limit on execution runtime. – colm.anseo May 10 '22 at 18:03
@colm.anseo I just thought for a test you need (before hitting initial seed and had to stop) a counter and 2^64 bits array - to set bits at the position where you generate value. Then you could check counter to be 2^64 and bit array for zeroes. 2^64 bits is 2^56 bytes - more than a petabyte, less than exabyte. Then with jump-ahead you could fill parts of the bit array at will, in parallel – Severin Pappadeux May 11 '22 at 01:18
@colm.anseo. Just one follow up as I’m struggling to figure it out. Is there a way of having this work for a signed integer data type, where the inputs will always be positive, but have all outputs be positive too? I’ve tried bitshifting the MaxInt64 I’m modding by to halve it, and then re-adding that to move it into the positive number space, but I must be doing something wrong as I’m always getting clashes. The only solution I have so far is recursively calling it if we get a negative number, which will work, but is horrible. – FlamFace May 11 '22 at 16:42
@FlamFace update answer with signed numbers example: the logic & one-to-one mapping guarantee remains the same. – colm.anseo May 11 '22 at 17:49
Apologies, and thank you for being so comprehensive, I think I didn’t describe it well. This signed solution makes sense, but we get positive integers for a negative input, and vice versa. My goal, given that i know all inputs will be positive is to have all outputs be positive too. Basically, I want to mod with an int31 and shift by an int31 too, but I can’t seem to get that working… – FlamFace May 11 '22 at 18:22
If your type is `int32` you want to mod `2^31` so only positive numbers appear. A simple mask can perform this mod and will prevent the upper (negative) bit being set: `return (m*p + s) & (1<<31 - 1)` – colm.anseo May 11 '22 at 19:47

Severin Pappadeux · Answer 2 · 2022-05-10T17:57:51.500

To add to colm.anseo answer, this kind of mapping is also known as Linear congruential generator.

X_n+1 = (aX_n+c) mod m

When c ≠ 0, correctly chosen parameters allow a period equal to m, for all seed values. This will occur if and only if:

m and c are relatively prime,
a-1 is divisible by all prime factors of m,
a-1 is divisible by 4 if m is divisible by 4.

These three requirements are referred to as the Hull–Dobell Theorem.

For 64bit LCG a=6364136223846793005 and c=1442695040888963407 from Knuth looks good.

Note, that LCG mapping is 1-to-1, it maps whole [0...2⁶⁴-1] region to itself. You could invert it if you want. And as RNG it has distinctive ability for jump-ahead.

Generating a deterministic int from another with no duplicates

2 Answers2

Linked

Related