I would like a function that will produce k pseudo-random values from a set of n integers, zero to n-1, without repeating any previous result. k is less than or equal to n. O(n) memory is unacceptable because of the large size of n
and the frequency with which I'll need to re-shuffle.
These are the methods I've considered so far:
Array: Normally if I wanted duplicate-free random values I'd shuffle an array, but that's O(n) memory. n is likely to be too large for that to work.
long nextvalue(void) {
static long array[4000000000];
static int s = 0;
if (s == 0) {
for (int i = 0; i < 4000000000; i++) array[i] = i;
shuffle(array, 4000000000);
}
return array[s++];
}
n-state PRNG:
There are a variety of random number generators that can be designed so as to have a period of n
and to visit n
unique states over that period. The simplest example would be:
long nextvalue(void) {
static long s = 0;
static const long i = 1009; // assumed co-prime to n
s = (s + i) % n;
return s;
}
The problem with this is that it's not necessarily easy to design a good PRNG on the fly for a given n
, and it's unlikely that that PRNG will approximate a fair shuffle if it doesn't have a lot of variable parameters (even harder to design). But maybe there's a good one I don't know about.
m-bit hash:
If the size of the set is a power of two, then it's possible to devise a perfect hash function f()
which performs a 1:1 mapping from any value in the range to some other value in the range, where every input produces a unique output. Using this function I could simply maintain a static counter s
, and implement a generator as:
long nextvalue(void) {
static long s = 0;
return f(s++);
}
This isn't ideal because the order of the results is determined by f()
, rather than random values, so it's subject to all the same problems as above.
NPOT hash:
In principle I can use the same design principles as above to define a version of f()
which works in an arbitrary base, or even a composite, that is compatible with the range needed; but that's potentially difficult, and I'm likely to get it wrong. Instead a function can be defined for the next power of two greater than or equal to n
, and used in this construction:
long nextvalue(void) {
static long s = 0;
long x = s++;
do { x = f(x); } while (x >= n);
}
But this still have the same problem (unlikely to give a good approximation of a fair shuffle).
Is there a better way to handle this situation? Or perhaps I just need a good function for f()
that is highly parameterisable and easy to design to visit exactly n
discrete states.
One thing I'm thinking of is a hash-like operation where I contrive to have the first j
results perfectly random through carefully designed mapping, and then any results between j
and k
would simply extrapolate on that pattern (albeit in a predictable way). The value j
could then be chosen to find a compromise between a fair shuffle and a tolerable memory footprint.