5

Random question.

I am attempting to create a program which would generate a pseudo-random distribution. I am trying to find the right pseudo-random algorithm for my needs. These are my concerns:

1) I need one input to generate the same output every time it is used.

2) It needs to be random enough that a person who looks at the output from input 1 sees no connection between that and the output from input 2 (etc.), but there is no need for it to be cryptographically secure or truly random.

3)Its output should be a number between 0 and (29^3200)-1, with every possible integer in that range a possible and equally (or close to it) likely output.

4) I would like to be able to guarantee that every possible permutation of sequences of 410 outputs is also a potential output of consecutive inputs. In other words, all the possible groupings of 410 integers between 0 and (29^3200)-1 should be potential outputs of sequential inputs.

5) I would like the function to be invertible, so that I could take an integer, or a series of integers, and say which input or series of inputs would produce that result.

The method I have developed so far is to run the input through a simple halson sequence:

boost::multiprecision::mpz_int denominator = 1;
boost::multiprecision::mpz_int numerator = 0;

while (input>0) {
    denominator *=3;
    numerator = numerator * 3 + (input%3);
    input = input/3;
}

and multiply the result by 29^3200. It meets requirements 1-3, but not 4. And it is invertible only for single integers, not for series (since not all sequences can be produced by it). I am working in C++, using boost multiprecision.

Any advice someone can give me concerning a way to generate a random distribution meeting these requirements, or just a class of algorithms worth researching towards this end, would be greatly appreciated. Thank you in advance for considering my question.

----UPDATE----

Since multiple commenters have focused on the size of the numbers in question, I just wanted to make clear that I recognize the practical problems that working with such sets poses but in asking this question I'm interested only in the theoretical or conceptual approach to the problem - for example, imagine working with a much smaller set of integers like 0 to 99, and the permutations of sets of 10 of output sequences. How would you design an algorithm to meet these five conditions - 1)input is deterministic, 2)appears random (at least to the human eye), 3)every integer in the range is a possible output, 4)not only all values, but also all permutations of value sequences are possible outputs, 5)function is invertible.

---second update---

with many thanks to @Severin Pappadeux I was able to invert an lcg. I thought I'd add a little bit about what I did to hopefully make it easier for anyone seeing this in the future. First of all, these are excellent sources on inverting modular functions:

https://www.khanacademy.org/computing/computer-science/cryptography/modarithmetic/a/modular-inverses

https://www.khanacademy.org/computer-programming/discrete-reciprocal-mod-m/6253215254052864

If you take the equation next=ax+c%m, using the following code with your values for a and m will print out the euclidean equations you need to find ainverse, as well as the value of ainverse:

    int qarray[12];
    qarray[0]=0;
    qarray[1]=1;
    int i =2;
    int reset = m;
    while (m % a >0) {
      int remainder=m%a;
      int quotient=m/a;
      std::cout << m << " = " << quotient << "*" << a << " + " << remainder << "\n";
      qarray[i] =qarray[i-2]-(qarray[i-1]*quotient);
      m=a;
      a=remainder;
      i++;
  }
if (qarray[i-1]<0) {qarray[i-1]+=reset;}
std::cout << qarray[i-1] << "\n";

The other thing it took me a while to figure out is that if you get a negative result, you should add m to it. You should add a similar term to your new equation:

prev = (ainverse(next-c))%m;
if (prev<0) {prev+=m;}

I hope that helps anyone who ventures down this road in the future.

Jonathan Basile
  • 649
  • 1
  • 10
  • 20
  • 1
    Why 29^3200-1? That's huge; way beyond any C++ type, needing about 15546 bits to store it. – OMGtechy Apr 10 '15 at 20:24
  • 1
    This is called a hash function, you can try % 225 or something – Joshua Byer Apr 10 '15 at 20:25
  • That's why I've been using boost multiprecision. 29^3200 is the number of unique outputs my program needs to be able to produce - if there were, a method, for example, which built them up by adding chars to a string that might work and require less memory, but whatever method i use, that's the number of unique outputs I need the algorithm to produce. – Jonathan Basile Apr 10 '15 at 20:27
  • 1
    My advice is to do a literature search and find out who is doing recent work on PRNGs, then either use their published work if it fits, or contact and pay them to develop what you want. With the range you're specifying, the implied size of the state space is ginormous. This sort of stuff is very hard - even brilliant mathematicians such as von Neumann have had miserable failures. – pjs Apr 10 '15 at 20:27
  • well - i recognize that increasing the range of possible outputs creates different demands - but what I'm most looking for is just the right conceptual approach for getting started thinking about the problem. For example - if you were approaching this problem with a range of possible outputs from 0 to 999 and trying to ensure all possible sequences of 10 integers, how would you do it? – Jonathan Basile Apr 10 '15 at 20:41
  • @Joshua Byer - I've just taken a glance at a description of hash functions - what I'm reading says that they are designed in order not to be invertible, such as for cryptography purposes. is there a class of hash functions that is invertible? – Jonathan Basile Apr 10 '15 at 20:46
  • Hash Functions serve as a way to sort information, they can actually return the same value for multiple inputs.. Meaning there is no way they can be returned to their original state. There are encryption methods like md5 that are built into many languages. You can google them if you like, they are decryptable. – Joshua Byer Apr 10 '15 at 20:53
  • 2
    @JoshuaByer 1. md5 is also a hash function 2. it's deprecated – deW1 Apr 10 '15 at 21:07
  • @Mooing Duck - I'm interested in how you arrived at that figure - I would think 29^1312000 possibilities would translate to about 2^6,500,000 bits at its upper range, about 6.5 MB. Regardless - I realize there are enormous practical difficulties in getting something like this to work, but my question is focused on the theoretical/conceptual angle - like i mentioned above - given a much smaller set of possible values, how would you ensure that it covered not only the distribution but also permutations of sequences, and was invertible? – Jonathan Basile Apr 10 '15 at 23:04
  • 2^6,500,000 bits is not 6,5 Mb, it is more than what you would get by turning every single atom of the universe in a 0-1 bit. Probably some kind of confusion here. – vib Apr 11 '15 at 14:35
  • sorry i meant 6,500,000 bits – Jonathan Basile Apr 11 '15 at 14:58
  • 1
    @JonathanBasile: I mixed number of bits with the powers so I'll do over. The number `29^3200~2^15546` could be stored in 15546 bits. A state machine with `15546~2^13.924` bits could produce at most a single permutation. Every permutation of `410~2^8.679` outputs takes `singlePermSize*410=2^(13.924+8.679)~2^22.603` bits of state as a theoretical minimum, ~778 kilobytes (you wrote 6.5MB, but forgot the bit->byte conversion). So you're mostly right there, I was way off – Mooing Duck Apr 11 '15 at 19:13
  • 1
    Also note that since the theoretical minimum state is 2^22.603 bits, that's both the minimum size of the RNG object itself, and _also_ the minimum size of the seed. If the RNG algorithm isn't _perfect_, it may also require more size and seed-bits. – Mooing Duck Apr 11 '15 at 19:20
  • thank you for clarifying that - It's helpful for me to see the fundamental concepts in play. – Jonathan Basile Apr 11 '15 at 19:50
  • By the title I thought you may need something like this: https://fabiensanglard.net/fizzlefade/index.php but I am not sure after reading the rest of the question. – urnenfeld Jul 11 '20 at 23:27

3 Answers3

2

Ok, I'm not sure if there is a general answer, so I would concentrate on random number generator having, say, 64bit internal state/seed, producing 64bit output and having 2^64-1 period. In particular, I would look at linear congruential generator (aka LCG) in the form of

next = (a * prev + c) mod m

where a and m are primes to each other

So:

1) Check

2) Check

3) Check (well, for 64bit space of course)

4) Check (again, except 0 I believe, but each and every permutation of 64bits is output of LCG starting with some seed)

5) Check. LCG is known to be reversible, i.e. one could get

prev = (next - c) * a_inv mod m

where a_inv could be computed from a, m using Euclid's algorithm

Well, if it looks ok to you, you could try to implement LCG in your 15546bits space

UPDATE

And quick search shows reversible LCG discussion/code here

Reversible pseudo-random sequence generator

Community
  • 1
  • 1
Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • I like this possibility. If I understand it correctly - to get all possible permutations of 3200 integers between 0 and 28 I could use a=(2^15545)+1, c=3, and m=2^15546 - then my formula would be next = ((a*prev+c)%m)%28 - is that the correct way to implement this? – Jonathan Basile Apr 12 '15 at 15:40
  • I think that was a mistake. I see that when I use %28 I get short repeating patterns. I suppose I could use the values mentioned above with the formula (a*prev+c)%m and do a base conversion with the result. – Jonathan Basile Apr 12 '15 at 15:58
  • @JonathanBasile I would recommend to look at the table in `http://en.wikipedia.org/wiki/Linear_congruential_generator` to get the idea about a, m, c. Say, for NumRecipes m=2^32, a=1664525, c=1013904223. For you it might be m=2^15545, a=some large prime number with about equal numbers of 1s and 0s, and with length a bit below of 15545 bits, c=1 for a start, and maybe another big prime number – Severin Pappadeux Apr 12 '15 at 18:18
  • O man, it took me all freaking day to figure out how to invert a modular function, which says much more about my math skills than the difficulty of the task. I'm going to edit my question with what I did so that it's useful (hopefully) to anyone who sees it in the future. Thank you for suggesting this. – Jonathan Basile Apr 13 '15 at 00:19
  • @JonathanBasile glad to hear. Would like to hear what you use as `a,m,c` for your multi-bit LCG – Severin Pappadeux Apr 13 '15 at 03:49
  • hmm...I've been playing around with this more today and find some of the results unsatisfying. The problems mentioned in the literature - that the highest-significance bits repeat at a low periodicity and repetition occurs most starting from low seed numbers - are pretty unsatisfying. Also, the length of the numbers produced has much more regularity, and betrays their mode of production. Some of these are effects I could mask by playing with a/c/m, though I may try a mersenne twister and see if i like the results more. The last combo i tried was: c=15485863^500 a = 2^15049+1 m=2^16275... – Jonathan Basile Apr 13 '15 at 18:09
  • I think Ill try an a value closer to m to see if that masks the length similarities somewhat. – Jonathan Basile Apr 13 '15 at 18:10
  • Yes, there is known problem `While LCGs are capable of producing pseudorandom numbers which can pass formal tests for randomness, this is extremely sensitive to the choice of the parameters c, m, and a.` this is why I'm curious how to select a,m,c to get reasonable generator. Did you look at Wiki link and Hull-Dobell Theorem mentioned there? – Severin Pappadeux Apr 13 '15 at 18:44
  • Yes, I made sure m & c were coprime, and that a-1 would be divisible by 4 and by the prime factor of m (2). But even though this ensures the period will be around m, there is still very un-subtle repetition at much smaller intervals. for example, every time the pattern rounds m, when you get a value very close to 0, that value will recur (all but its last couple of bits) again three times in the next 1000 results. – Jonathan Basile Apr 13 '15 at 19:07
2

In your update, "appears random (to the human eye)" is the phrasing you use. The definition of "appears random" is not a well agreed upon topic. There are varying degrees of tests for "randomness."

However, if you're just looking to make it appear random to the human eye, you can just use ring multiplication.

  • Start with the idea of generating N! values between 0 and M (N>=410, M>=29^3200)
  • Group this together into one big number. we're going to generate a single number ranging from 0 to *M^N!. If we can show that the pseudorandom number generator generates every value from 0 to M^N!, we guarantee your permutation rule.
  • Now we need to make it "appear random." To the human eye, Linear Congruent Generators are enough. Pick a LCG with a period greater than or equal to 410!*M^N satisfying the rules to ensure a complete period. Easiest way to ensure fairness is to pick a LCG in the form x' = (ax+c) mod M^N!

That'll do the trick. Now, the hard part is proving that what you did was worth your time. Consider that the period of just a 29^3200 long sequence is outside the realm of physical reality. You'll never actually use it all. Ever. Consider that a superconductor made of Josephine junctions (10^-12kg processing 10^11bits/s) weighing the mass of the entire universe 3*10^52kg) can process roughly 10^75bits/s. A number that can count to 29^3200 is roughly 15545 bits long, so that supercomputer can process roughly 6.5x10^71 numbers/s. This means it will take roughly 10^4600s to merely count that high, or somewhere around 10^4592 years. Somewhere around 10^12 years from now, the stars are expected to wink out, permanently, so it could be a while.

Cort Ammon
  • 10,221
  • 31
  • 45
2

There are M**N sequences of N numbers between 0 and M-1. You can imagine writing all of them one after the other in a (pseudorandom) sequence and placing your read pointer randomly in the resulting loop of N*(M**N) numbers between 0 and M-1...

def output(input):
    total_length = N*(M**N)
    index = input % total_length
    permutation_index = shuffle(index / N, M**N)
    element = input % N
    return (permutation_index / (N**element)) % M

Of course for every permutation of N elements between 0 and M-1 there is a sequence of N consecutive inputs that produces it (just un-shuffle the permutation index). I'd also say (just using symmetry reasoning) that given any starting input the output of next N elements is equally probable (each number and each sequence of N numbers is equally represented in the total period).

6502
  • 112,025
  • 15
  • 165
  • 265