Generate different random numbers within a range

Question

I want to generate n different numbers between 1 and N (of course n<=N). N could be very large. If n is very small, one efficient way is generating a numbers and compare it with the set we have got to make sure it's a new number. It takes O(n^2) time and O(n) memory. If n is quite large, we can use Fisher–Yates shuffle algorithm to generate a random permutation( stop after n steps). It takes O(n) time, but we also must use O(N) memory.

Here is the question. What can we do if we do not know how large n is? I hope that the algorithm just use O(n) memory and stop after O(n) time. Is that possible?

That's a pretty poor duplicate - N there is 1000, here could be "very large". — jrok, Oct 31 '13 at 15:14
@jrok: Fair enough, close vote retracted. I do note however an O(1)-space solution on that page: http://stackoverflow.com/a/202225/47984. — j_random_hacker, Oct 31 '13 at 15:18
Jerry Coffin's dupe is a much better fit for this -- it's completely unbiased, and if a tree structure is used to hold the numbers chosen then it's O(n)-space and O(log n) time to select each element. I doubt any data structure can improve this to O(1) time to select each element. — j_random_hacker, Oct 31 '13 at 15:24
How can you not know how large `n` is, yet perform the calculation? What is wrong with an algorithm that says "oh look, n is 5. I will do this..." — Floris, Oct 31 '13 at 15:27
@Floris: The way I interpret it is that they want an *online* algorithm -- i.e. one where it's always possible to cheaply add a new, distinct sample later on. — j_random_hacker, Oct 31 '13 at 15:28
@j_random_hacker: If you implement the set as a hash table you can get O(1) (at least for the expected time). — Jerry Coffin, Oct 31 '13 at 15:45
@JerryCoffin: You're right, I was getting my n and N confused... but somehow only w.r.t. the hashtable, not the tree! — j_random_hacker, Oct 31 '13 at 15:47
@j_random_hacker: Happens to the best of us (well, I guess I don't know about the best, but I've seen it happen to a lot better than me, anyway). — Jerry Coffin, Oct 31 '13 at 15:50

score 0 · Answer 1 · answered Oct 31 '13 at 15:23

0

You can essentially do the same as for very small n, but just make that check more efficient. For example the naïve method of checking if you've already generated a number is to just linearly search the list of previously generated values. For an unknown n you could keep the set of previously generated values sorted so that you can use a more efficient search for identifying duplicates. With the naïve approach the algorithm takes O(n²) time, but a smarter search through previous results can reduce that to O(n*log₂ n).

answered Oct 31 '13 at 15:23

bames53

86,085
15
179
244

Inserting a value into a sorted array is O(n) time though. And if n is a large enough fraction of N that duplicates turn up often then an exponential term appears in the time complexity to account for the time spent rerunning it, and will dominate it. – j_random_hacker Oct 31 '13 at 15:28
@j_random_hacker: The array doesn't need to be sorted. It could just as easily be a hash table. – rici Oct 31 '13 at 15:31
@j_random_hacker so don't use an array. A tree can have O(log n) searching and insertion and is easy to keep sorted. – bames53 Oct 31 '13 at 15:31
@bames53: I would suggest changing your answer to say that, but I don't think it addresses the main problem I identified -- the exponential time complexity that results when already-selected numbers become sufficiently dense. – j_random_hacker Oct 31 '13 at 15:34
1

@j_random_hacker: You can exponentially rehash a hash table resulting in amortized linear insertion cost, just like a vector. Random numbers are *ideal* as hash-table keys, too. – rici Oct 31 '13 at 15:41
@rici: Sorry, got my n and N confused... Serious caffeine shortage here. Comment deleted. – j_random_hacker Oct 31 '13 at 15:43

Generate different random numbers within a range

1 Answers1