Non-repetitive random seek in a range Algorithm

Question

I'm looking for an efficient algorithm that produces random values within a range, without repetitions.

In pseudo-code: (in class Rand)

  Rand(long from, long to) {
    this.from = from;
    this.to = to;
    // ...
  }

  long getNumber() {

    // returns a random number in the [from, to] range
    //  which has never been returned before
  }

Usage:

  Rand r = new Rand(1, 100000000);

  long x = r.getNumber();
  long y = r.getNumber();
  ...

The numbers returned from r.getNumber() should always be different from the ones previously returned.
Of course, if to - from + 1 numbers were returned, the algorithm should start again (or fall in error - not important anyway).

Note that the range may be very large, and thus a randomly arranged array (containing initially [from, to] numbers) is likely to overflow the memory.

Any mathematical closed formula that would fit the purpose is welcome... (Like unique pseudo-random generation, given a root). — Déjà vu, Aug 19 '11 at 03:39
Given the code, I assume you're talking about R-Java. the R tag gives a bit confusion, as the class R (from Java) is something else than the programming language R (for which the tag is used). I've updated the tags. — Joris Meys, Aug 19 '11 at 12:09
@Joris Actually this *R* was brief for Random. Not related to the R statistical language. The rJava tag is removed. — Déjà vu, Aug 19 '11 at 12:16
I made the same mistake: forgot that rjava is an R package as well. If it's about Java (which I presume), you can add the Java tag for clarity — Joris Meys, Aug 19 '11 at 12:21
@Joris Hmm it could in any language actually. I thought Java was convenient because I wanted a constructor, but could be in C++, or in C. — Déjà vu, Aug 19 '11 at 12:37

score 5 · Accepted Answer · answered Aug 19 '11 at 12:09

5

A cypher is a one-to-one mapping, otherwise it couldn't be decrypted. Hence, any block cypher will map the numbers 0, 1, 2, 3, 4, 5, ... to different n-bit numbers, where n is the cypher's block size in bits.

It is relatively easy to put together a simple 4-round Feistel cypher with whatever (even) block size you want. With only four rounds it would be fast but insecure. Alternatively use the Hasty Pudding cypher which can have pretty much any block size you want.

Whatever cypher you use, just encrypt the numbers 0, 1, 2, ... and look at the output block. You can throw away any results that are outside your required range and all results are guaranteed unique.

answered Aug 19 '11 at 12:09

rossum

15,344
1
24
38

Thanks, this is an idea I had also, using md5 (will not have collisions from 1..2^bigEnough). The only problem is the generation will always be the same - thus not random. I.e. md5(5) will always return the same result. – Déjà vu Aug 19 '11 at 12:30
@ring0: With MD5 (or any crypto sized hash) collisions are highly unlikely, but possible. You can vary the output by concatenating the counter with a particular key: "0keyA", "1keyA", "2keyA" ... and "0keyB", "1keyB", "2keyB" ... in much the same way you would use a cypher key to vary the output from the counter that I suggested. – rossum Aug 19 '11 at 12:50
I'm going to admit that block ciphers are the way to go, with some comments and reservations. The Luby-Rackoff analysis referred to in http://en.wikipedia.org/wiki/Feistel_cipher#Theoretical_work suggests that e.g. if you base your round functions on HMAC the result shouldn't be too insecure. However for a range of 0..3 there are 4! = 24 permutations, but a 4-round Fiestel cipher can be described with 8 bits - the round values of 0 and 1 at each stage. 24 does not divide 256, so some permutations are more likely than others. But every solution here faces that problem - so add more rounds. – mcdowella Aug 20 '11 at 10:27

Johan · Answer 2 · 2013-04-08T17:55:28.210

@rossum said:

A cypher is a one-to-one mapping, otherwise it couldn't be decrypted. Hence, any block cypher will map the numbers 0, 1, 2, 3, 4, 5, ... to different n-bit numbers, where n is the cypher's block size in bits.

So even xor encryption or bitwise reverse would do for some purposes.

And here is a php function using xor and bitwise reverse as a simple 1-to-1 encryption.

It is a pseudo random number generator with guaranteed fill in of all values and no identical values. You supply n:0..63 and it provides a random 0..63.

It only accepts 2^i ranges, like 0..63, 0..127 etc.

Is not cryptographically safe etc, just random.

Such a function is extremely suitable for garbage collection routines, as it will not attempt to clean the same area twice, even though doing things randomly.

 function math_random_filled($n,$bits,$seed)
 {
   //bits: examples: 6=0..63, 8=0..255, 10: 0..1023
   //n: 0<= n <2^$bits
   //seed: any string or number

   //generate xor: crc32 + bit mask
   $xor= crc32($seed.'internalseed') & ~(-1<<$bits);
   //xor
   $r=intval($n)^$xor;
   //bitwise reverse
   $r=bindec(strrev(str_pad(decbin($r),$bits,'0',STR_PAD_LEFT)));
   return $r;
 }

 //demonstration
 $bits=6;
 $min=0;
 $max=pow(2,$bits)-1;
 $count=$max-$min+1;
 for($n=0;$n<=$max;$n++)
 {
   $r=math_random_filled($n,$bits,$seed='someseed');
   echo " $r ";
   $ar[$r]=1;
 }


 $set=0;
 for($n=0;$n<=$max;$n++)
   if(isset($ar[$n])) $set++;


 echo "\n"."bits: $bits,  count: $count, set: ". $set."\n\n";

example output:

 37  5  53  21  45  13  61  29  33  1  49  17  41  9  57  25  39  7  55  23  47  15  63  31  35  3  51  19  43  11  59  27  36  4  52  20  44  12  60  28  32  0  48  16  40  8  56  24  38  6  54  22  46  14  62  30  34  2  50  18  42  10  58  26 

bits: 6,  count: 64, set: 64

you can test the code here in php sandbox

And here is another one, but accepts any range, not only powers of 2.

function math_random_filled_arithmetical($n,$max,$seed)
    {
    /*
    - produces 0..max, repeatable, unique, one-to-one mapped random numbers
    - uses arithmetic operations to imitate randomness
    - $n: 0<=$n=<$max
    - $max: any integer, not only power of two
    - $seed: any string or number
    */
    $n=intval($n);
    $max=intval($max);
    $opt1=$n;
    $opt2=$max-$n;
    $n2=min($opt1,$opt2);
    $reverseit=crc32($seed.'internalseed'.$n2)&1;
    if($opt1>$opt2) $reverseit=!$reverseit;
    $max2=floor(intval($max-1)/2);
    //echo "n:$n, max:$max,n2:$n2,max2:$max2,reverseit:$reverseit\n";
    if($max>=3)
    if($opt1!=$opt2)
        $n2=math_random_filled_arithmetical($n2,$max2,$seed.'*');
    $res=$reverseit? $max-$n2:$n2;
    $res=intval(fmod($res+(crc32($seed)&(1<<30)),$max+1));
    //echo "n:$n, max:$max, res:$res\n";
    return $res;
    }


//demonstration
$max=41;//-- test a max value 
for($n=0;$n<=$max;$n++)
    {
    $r=math_random_filled_arithmetical($n,$max,$seed='someseed');
    $ar[$r]=1;
    echo " $r ";
    }
$filled=0;
for($n=0;$n<=$max;$n++)
    if(isset($ar[$n])) $filled++;
echo "\n count: ".($max+1).", filled: ". $filled."\n";

example output:

 20  19  18  17  33  32  37  36  14  13  31  34  35  26  16  11  12  3  39  40  0  41  1  2  38  29  30  25  15  6  7  10  28  27  5  4  9  8  24  23  22  21 
 count: 42, filled: 42

related php sandbox is here

Interesting, +1. Reversing bits adds to the "random" effect. But I'm not sure the crc32 is necessary as long as you get an initial random *xor* value in the specific range.. — Déjà vu, Apr 08 '13 at 05:14
Thanks! Re: crc32, it serves as an xor cooker, it accepts any concatanation of strings and numbers, converts them to a pseudo random 32 bit value. This allows easy user seeding, otherwise user would have to supply a sensible seed xor value with a nice mixture of 1s and 0s. — Johan, Apr 08 '13 at 14:03

Patrick87 · Answer 3 · 2011-08-19T04:06:43.330

One way to do this would be to generate a list of numbers between from and to, removing these at random until the bag is empty, at which point it's re-populated. To save on storage for large ranges, you can record picked numbers up to a point (re-picking when a duplicate is chosen), since the probability of picking a duplicate should be low initially. Determining the optimal transition point will probably be an empirical exercise.

EDIT: Some more thoughts.

For truly huge ranges, not even that will provide good performance under the memory limitation. One idea might be to store the candidates not as a list of numbers, but as an interval. So, initially, you choose between from and to, getting x1. Next time, pick from a number from the first subinterval or the second, with probability in proportion to the interval length. At each step, eliminate intervals which have zero length. This requires storing M + 2 integers (in the worst case), where M is the number of draws, or N/2 asymptotically for large N (in the worst case), where N is the initial interval size. Somebody might double-check me, though.

hamstergene · Answer 4 · 2011-08-19T04:48:08.393

If you don't require every number from the interval to appear eventually, you can use a linear congruental generator:

int getNumber() {
    seed = (seed * A + C) mod (to-from);
    return seed + to;
}

It's periodical, the new period begins when seed becomes equal to the initial value, and the length of the period depends on A and C choice.

Pros: O(1) time and space, cons: not every number from the interval will appear.

For intervals of length 2^m, take a look at http://en.wikipedia.org/wiki/Linear_feedback_shift_register I did not use it, but wikipedia says it is possible it to be maximum-length, i.e. you can have all numbers (except one) appear in the output.

score 1 · Answer 5 · answered Aug 19 '11 at 06:03

1

Some thoughts for a starting point:

1) Supposes that you have a function f(x) which is a permutation on 1..N where N is larger than your range. If you apply it to x within the range it may produce an illegal value - one outside your range. You could define a permutation within your range by simply calling f again on the illegal value. You will eventually come to a legal value because the sequence x, f(x), f^2(x), f^3(x) must eventually cycle and if the worst comes to the worst it will come back in at x.

2) There are switching networks which allow you to produce all possible permutations on N objects for special N - one example is http://en.wikipedia.org/wiki/Clos_network#Bene.C5.A1_network_.28m_.3D_n_.3D_2.29 (Funny URL - Benes network). You can get an arbitrary permutation on N objects, where N may I think have to be a power of 2, by setting the switches at random. Since there will be K switches there are 2^K ways of setting them which means that you do not have M! permutations with equal probability but perhaps you won't mind this, or can minimise the non-randomness by repeating this multiple times, or something.

3) If you are prepared to achieve near-randomness by applying many different base permutations many different times you could try alternately adding mod N across the whole range and then splitting the range into sub-ranges and e.g. for some stretch of p-1 values within the range apply the permutation produced by multiplying by some k mode p. The hope would be that although each individual step is pretty basic by appling enough of them and making them diverse enough the result would be close to being random, especially if you chose the parameters at random.

answered Aug 19 '11 at 06:03

mcdowella

19,301
2
19
25

1) Permutations of 1..N... if N is large e.g. 10^9 what happens? Also how do you prevent collisions? f(1) is supposed to give the 1st permutation or random? 2) I'm not familiar with the Clos-Benes networks, but if I read you correctly, I should take log2(N) inputs and outputs and construct the network in between. So even with N=10^9, I only have 32 I/O. Looks like a good idea, but would you know how to implement the network from that? 3) I'm lost :-) Maybe an example? – Déjà vu Aug 19 '11 at 12:49
All of these ideas are ways to make random permutations f(). Then you generate the non-repetitive numbers by computing f(1), f(2), ... (1) is a way to make a random permutation on 1..N if you have a random permutation on 1..M where M is greater than N but not too much greater. I was thinking of cases where you could construct a random permutation for the next biggest power of 2 or next biggest prime or something. – mcdowella Aug 19 '11 at 17:49
(2) Works better if you base it on the batcher sorting network. The idea is to work out how you would build a network to sort N objects using a source of random numbers to set the switches at each level and then compute f(x) by following x through the network to see where it ends up without computing the settings for the switches that x does not actually cross over. So given a seed for your random number source you could compute the setting for switch i by combining the seed and the switch number i with something like http://en.wikipedia.org/wiki/HMAC, – mcdowella Aug 19 '11 at 17:52
(3) is a bit of a hack. Given a range between 1..N I might claim that you could compute f(x) by generating a random number K and making f(x) = x + K for all x. This obviously isn't very random. Another idea is to split the range into two parts 1..T and T+1..N and add L mod T when x is in the first range and do a similar trick mod N-T in the second range. Or if T was a prime you could multiply the first range by L mod T. This isn't very random either but if you put together enough of these stages you might get something acceptably random. – mcdowella Aug 19 '11 at 17:57
Another explanation for (1) - suppose you want a range of 0..13 and you base your answer on a block cipher which gives you a range of 0..15. I am saying that if you evaluate your block cipher on 1 to get f(1) = 14, then you should evaluate f(14) and take it as your answer if it is in range. It can't be 14 because f(1) = 14. It could be 15, but if it is then you will find that f(15) is in range so you are OK. Another way to look at this is to see that if you could write down the http://en.wikipedia.org/wiki/Cycle_notation for the permutation what you doing is editing out all the illegal values – mcdowella Aug 20 '11 at 10:32
It turns out that this is a known problem - see http://en.wikipedia.org/wiki/Format-preserving_encryption. It also turns out, by the way, that Feistel networks generate only even permutations! - although this may not be a problem in practice. Thanks for this problem - I have found it amusing enough to put some notes on http://www.mcdowella.demon.co.uk/PermutationFromHash.html – mcdowella Aug 28 '11 at 20:06

Ricky Bobby · Answer 6 · 2011-08-19T13:19:06.640

1

You could proceed this way (in python):

Create a xrange and pick k random element from it and use it as a RanndomGenerator:

import random

def randomFromTo(FROM,TO,k): #k is the number of sample you want to generate
    m= random.sample(xrange(FROM,TO),k)
    return (i for i in m)

Your Generator will generate all number randomly from FROM to TO and fail when it has generated more than k numbers:

with this example you would get :

RandomGenerator=randomFromTo(10,1000000000,12)

for k in range(12): 
    print RandomGenerator.next()

you get

edited Aug 19 '11 at 13:19

answered Aug 19 '11 at 10:35

Ricky Bobby

7,490
7
46
63

1

What happens if the range is large, e.g. 10^9? Doesn't python create the range internally(?) – Déjà vu Aug 19 '11 at 12:17
@ring0 ... ok I updated it in the case you know approximately the number of random number you will generate. If you want to generate 1000 number between 1 and 10^9 it will work. xrange is not actually creating the list. – Ricky Bobby Aug 19 '11 at 13:20
@ring0 I just realised it's the same function as Roman Luštrik suggested. but in python – Ricky Bobby Aug 19 '11 at 13:57
Do you think the R and python implementations are the same? Or only the functionality is identical? – Déjà vu Aug 19 '11 at 14:50
@ring0 , not sure the implementations are the same. You can find the python implementation here http://svn.python.org/view/python/branches/release27-maint/Lib/random.py?view=markup . But I don't know much about R, sry. – Ricky Bobby Aug 19 '11 at 15:05

score 1 · Answer 7 · answered Aug 19 '11 at 11:42

I will pretend you are asking for a solution in R (according to tag). What you're trying to do is sample without replacement. In R, there's a function called sample. Here, I'm sampling a vector of 30 values (1, 2, 3... 30), and once I draw a number, it's not replaced. You can make this reproducible on other machines by setting a seed (see set.seed).

I ran this a number of times to show the randomness.

> sample(x = 1:30, size = 10, replace = FALSE)
 [1]  9 16 13 20 12  3  1  5 28  7
> sample(x = 1:30, size = 10, replace = FALSE)
 [1] 22 11 26 29 20  1  3  6  7 10
> sample(x = 1:30, size = 10, replace = FALSE)
 [1]  1 11 16  7 22 26  3 25  8  9
> sample(x = 1:30, size = 10, replace = FALSE)
 [1]  7 17  3 22 21 24 27 12 28  2
> sample(x = 1:30, size = 10, replace = FALSE)
 [1] 30 21 23  2 27 24  3 18 25 19
> sample(x = 1:30, size = 10, replace = FALSE)
 [1]  4  6 11 16 26  8 17 22 23 25

Sorry for the confusion - I didn't set the rJava tag (removed now). The `sample()` function looks nice anyway. The advantage over the classical solutions is the `x` range may be large, while `size` could be rather small. And there is no collision. +1 — Déjà vu, Aug 19 '11 at 12:28

Non-repetitive random seek in a range Algorithm

7 Answers7

Linked