1

I would like an algorithm/function that, given a number N, generates random numbers from 0 to N - 1 in constant time. After the Nth call, the function may do as it pleases. Also, it is important that the algorithm generates the numbers when requested rather than using shuffling, because I may (and in the average case do) not need the full list of numbers. What is the best approach to take?

(optional to read) I thought of having a hash set of numbers, and pulling the numbers out one at a time, but this would require inserting all elements (which I often do not need) into the hash set first... this will not work... Argh

Thanks for any help in advance.

tomKPZ
  • 827
  • 1
  • 10
  • 17
  • You are asking for a shuffle function without an explicit shuffle? You do realize that after N-1 calls, the return value isn't really very random any more, right? And I suppose that a function that returns the numbers in order (which is just as likely as any other "shuffle") doesn't meet your needs. Maybe you want to be more specific in your requirements? Why not a shuffle on the first call? – Floris Aug 20 '13 at 02:57
  • I do not want to shuffle on the first call because the list of elements is large, and I on average only need to iterate through the first few elements. – tomKPZ Aug 20 '13 at 03:05
  • Also, by "constant time" do you mean exactly the same amount of time for each call, or just O(1)? Also, you're not at all clear about exactly what probability distrubution you want--uniformly random is easiest, but your spec seems to imply you want something else. – Lee Daniel Crocker Aug 20 '13 at 03:10
  • So what you actually want is a small random sample (say N numbers) from a large range (say M integers), all unique, where N is much smaller than M? Sounds like Bob Floyd's algorithm. – Lee Daniel Crocker Aug 20 '13 at 03:11
  • See for example http://stackoverflow.com/questions/2394246/algorithm-to-select-a-single-random-combination-of-values – Floris Aug 20 '13 at 03:26

1 Answers1

2

Modify the Fisher–Yates shuffle by replacing the array with a map that stores only the elements that have been moved. In Python:

import random
class Shuffle:
    def __init__(self, n):
        self.d = {}
        self.n = n
    def generate(self):
        i = random.randrange(self.n)
        self.n -= 1
        di = self.d[i] if i in self.d else i  # idiomatically, self.d.get(i, i)
        dn = self.d[self.n] if self.n in self.d else self.n
        self.d[i] = dn
        self.d[self.n] = di
        return di

The space usage and amortized expected running time is O(1) words per element actually generated. Up to log factors, this is optimal.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • Floyd's algorithm is simpler and faster. – Lee Daniel Crocker Aug 20 '13 at 05:01
  • Floris's comment above has a link. The exact details of how you would implement it here depend on your needs, which still aren't clear. How big are the set and subset? Do you need an ordered set or just a combination? – Lee Daniel Crocker Aug 20 '13 at 08:16
  • 1
    @LeeDanielCrocker Floyd's algorithm requires the number of calls to be known ahead of time and does not generate a uniform random permutation without additional work. – David Eisenstat Aug 20 '13 at 12:45
  • Wrong on both counts. You're already depending on an existing random-within-range function in the code above, and it will be called with exactly the same arguments using Floyd as it would be for partial Fisher-Yates--it couldn't possibly be otherwise, or the math wouldn't work. Again, if you'll clarify what you want, I can give you code, but I can't do it without a spec, and you've given me no spec here I can work with. – Lee Daniel Crocker Aug 20 '13 at 12:53
  • @LeeDanielCrocker Maybe you should read the answer you linked to. Let's stipulate that the interface is what my Python class delivers: initialization with the parameter N, subsequent calls returning an integer between 0 inclusive and N exclusive chosen uniformly at random from those not already returned. – David Eisenstat Aug 20 '13 at 13:02
  • (Digging through my old code...) Oops, you're right, you have to know the number of calls (i.e, the size of the sample) ahead of time, so I can't use your generator approach. I withdraw my objection (although you are still mistaken about the "not uniform" part--it really does create a perfectly uniform sample and makes exactly the same RNG calls as partial F-Y. It's a shame the OP didn't give enough info to know if it will work for him. – Lee Daniel Crocker Aug 20 '13 at 13:16
  • 1
    @LeeDanielCrocker It generates a uniform random combination but not a uniform random permutation. – David Eisenstat Aug 20 '13 at 13:27
  • You might be right about that too--I'll have to do the math. I use it to generate K-card hands out of a deck, but I only care about combinations. Given the OP's odd and underspecified requirements, your method here is quite clever. – Lee Daniel Crocker Aug 20 '13 at 13:32