4

Possible Duplicate:
How do you efficiently generate a list of K non-repeating integers between 0 and an upper bound N

What are some alternative methods to generate 1000 distinct random integers in the range [0,8000] as opposed to the following:

  1. naive method: generating a number and checking if it's already in the array. O(n^2)
  2. linear shuffle: generate sequence 0 to 8000, shuffle, take the first 1000. O(n)
Community
  • 1
  • 1
snap
  • 711
  • 3
  • 11
  • 25
  • 3
    How is the first method 'naive'? Checking if it is in the array makes it NOT RANDOM. If it is random, DUPLICATES ARE EXPECTED. – Amy B Apr 04 '10 at 21:56
  • method 1 would be ok if you use a set instead of an array. – Nick Dandoulakis Apr 04 '10 at 21:58
  • 2
    duplicates: http://stackoverflow.com/questions/196017/unique-random-numbers-in-o1 http://stackoverflow.com/questions/158716/how-do-you-efficiently-generate-a-list-of-k-non-repeating-integers-between-0-and-n – Nick Dandoulakis Apr 04 '10 at 22:12
  • @Coronatus Are you suggesting it's impossible to randomly select 1000 from 8000 without replacement? If you can show how method 1 is statistically biased, please do. – Nick Johnson Apr 04 '10 at 22:28
  • 3
    Where do you get the O(n^2) in the first case? Each number needs to be tried `m` times, where `m` is geometrically distributed. This means that the complexity is worse. – Jørgen Fogh Apr 04 '10 at 22:58
  • 1
    for(i=0;i<1000;i++) print i; (this is a joke, of course) –  Apr 04 '10 at 23:01
  • You need to check, 1, 2, 3, 4, 5, 6, ..., n times. 1+2+3+...+n = n(n+1)/2 = O(n^2) – snap Apr 04 '10 at 23:05
  • 1
    @The Last Ninja: You need to check again until you get a new number. This means that you need to run through the list a geometrically distributed number of times for each new number. I don't know how to simplify the sum to find the answer, but I think it is O(n^3). – Jørgen Fogh Apr 04 '10 at 23:23
  • The problem with method one is that how long it takes is depends on how your number generating function works. If you are doing something like rand()%8001 you may end up getting in a situation where the period is less than 1000 and you are now in an infinite loop. – Dolphin Apr 07 '10 at 16:09

5 Answers5

12

You can use a partial Fisher-Yates shuffle implemented using swaps. One of the nice features of this algorithm is that if you stop after k swaps, the first k numbers are a random sample of size k from the complete set.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • +1. I was going to suggest just shuffling the whole array; for some reason this obvious optimisation didn't occur to me. – Nick Johnson Apr 04 '10 at 22:28
  • Problem is that you still need to create an array from [0,8000]. Can you create a permutation without the overhead? What if you wanted 10 unique numbers between 1..1,000,000? – Ray Apr 04 '10 at 23:17
  • 2
    Technically you could make it a sparse array implemented as a dict instead, and any position which doesn't have a value set is simply taken to be equal to its index. – Amber Apr 04 '10 at 23:23
  • @Dav: Good suggestion. This would save the O(n) construction of the list of integers from [0, 8000]. – Mark Byers Apr 04 '10 at 23:26
  • I don't recall mentioning anything about a bitarray? – Amber Apr 05 '10 at 00:21
  • @Dav: (great idea: was doing something along those lines for this and in general, k-permutations: http://stackoverflow.com/questions/2394246/algorithm-to-select-a-single-random-combination-of-values/2394308#2394308) – Andras Vass Apr 05 '10 at 19:33
2

You could create a list containing the numbers 0 to 8000.

Then looping 1000 times generate a random number between 0 and the length of the list.

Remove that element from the list and add it to an output list.

By removing the element you ensure that your selections are unique.

while (outputList.Count < 1000)
{
    index = random.Next(0, inputList.Count);
    outputList.Add(inputList[index]);
    inputList.RemoveAt(index);
}
ChrisF
  • 134,786
  • 31
  • 255
  • 325
1

This is from Knuth's the Art of Programming (via Jon Bentley's Programming Pearls), implemented in Python:

import random

# randomly select m numbers from n candidates    
def random_select(m, n):
    select = m
    result = []
    for i in xrange(n):
        if random.randint(0, n-i) < select:
            result.append(i)
            select -= 1
    return result

random_select(1000, 8000)

this will generate a list of random numbers in numerical order. It works by iterating over all the integers from 0-n (i.e 0-8000), and randomly selecting them with a probability of(number left to select / number of remaining candidates). It runs in O(n), so do not try it if n is very large compared to m - e.g. selecting ten numbers out of a billion. It uses no memory other than the result list (m) and a few local variables, unlike solutions that rely on shuffling a list of length n.

If you want the result in a random order then shuffle the list afterwards.

Dave Kirby
  • 25,806
  • 5
  • 67
  • 84
1

Partial Fisher-Yates, as @Mark has suggested, with a little twist, storing the swaps along the way.
This way, it will at most consume as much memory as the result list O(m).
It will also run in O(m) - not O(n), as other solutions that enumerate the whole range - so it should not have problems on larger ranges.
This way, you can have the best of both worlds.

/// <summary>
/// Generates unique random numbers
/// <remarks>
/// Worst case memory usage is O(min((emax-imin)/2, num))
/// </remarks>
/// </summary>
/// <param name="random">Random source</param>
/// <param name="imin">Inclusive lower bound</param>
/// <param name="emax">Exclusive upper bound</param>
/// <param name="num">Number of integers to generate</param>
/// <returns>Sequence of unique random numbers</returns>
public static IEnumerable<int> UniqueRandoms(
    Random random, int imin, int emax, int num)
{
    int dictsize = num;
    long half = (emax - (long)imin + 1) / 2;
    if (half < dictsize)
        dictsize = (int)half;
    Dictionary<int, int> trans = new Dictionary<int, int>(dictsize);
    for (int i = 0; i < num; i++)
    {
        int current = imin + i;
        int r = random.Next(current, emax);
        int right;
        if (!trans.TryGetValue(r, out right))
        {
            right = r;
        }
        int left;
        if (trans.TryGetValue(current, out left))
        {
            trans.Remove(current);
        }
        else
        {
            left = current;
        }
        if (r > current)
        {
            trans[r] = left;
        }
        yield return right;
    }
}
Community
  • 1
  • 1
Andras Vass
  • 11,478
  • 1
  • 37
  • 49
0

Sorted list with no sort, O(n)

If you want the integers sorted, I got to this answer in another question with a lot of help. You can do it using an exponential variate and thereby avoid any sort. As a result it is O(n):

From Alok's answer and Dan Dyer's comment it turns out that using an exponential distribution for a set of deltas gives a uniform distribution of integers in sequence.

So, you just start generating numbers and then scale them at the end. Adding 1 to the delta ensures you never repeat a value.

import random,sys,math

def genSortedInts(mini,maxi,vals):
    running = 0
    deltas = [random.expovariate(1.0) for i in range(0,vals+1)]
    floats = []
    for d in deltas:
        running += d
        floats.append(running)
    upper = floats.pop()
    valRange = maxi-mini-(vals-1)
    ints = [mini+int(f/upper*valRange)+id for id,f in enumerate(floats)]
    return ints

if __name__ == "__main__":
    vals = 10
    maxi = 80
    mini = 0
    print(genSortedInts(mini,maxi,vals))

Note the use of random.expovariate(1.0), a Python exponential distribution random number generator (very useful!). Here it's called with a mean of 1.0 (arg is 1/mean), but since the script normalises against the last number in the sequence, the mean itself doesn't matter.

Output (fair dice roll) for 10 values up to 80:

[3, 5, 10, 16, 25, 37, 41, 45, 57, 70]
Community
  • 1
  • 1
Phil H
  • 19,928
  • 7
  • 68
  • 105