4

I want to generate n different numbers between 1 and N (of course n<=N). N could be very large. If n is very small, one efficient way is generating a numbers and compare it with the set we have got to make sure it's a new number. It takes O(n^2) time and O(n) memory. If n is quite large, we can use Fisher–Yates shuffle algorithm to generate a random permutation( stop after n steps). It takes O(n) time, but we also must use O(N) memory.

Here is the question. What can we do if we do not know how large n is? I hope that the algorithm just use O(n) memory and stop after O(n) time. Is that possible?

nicolas.wong
  • 139
  • 1
  • 5
  • That's a pretty poor duplicate - N there is 1000, here could be "very large". – jrok Oct 31 '13 at 15:14
  • @j_random_hacker That uses O(N) memory (not O(n)). – Bernhard Barker Oct 31 '13 at 15:14
  • @jrok: Fair enough, close vote retracted. I do note however an O(1)-space solution on that page: http://stackoverflow.com/a/202225/47984. – j_random_hacker Oct 31 '13 at 15:18
  • Jerry Coffin's dupe is a much better fit for this -- it's completely unbiased, and if a tree structure is used to hold the numbers chosen then it's O(n)-space and O(log n) time to select each element. I doubt any data structure can improve this to O(1) time to select each element. – j_random_hacker Oct 31 '13 at 15:24
  • How can you not know how large `n` is, yet perform the calculation? What is wrong with an algorithm that says "oh look, n is 5. I will do this..." – Floris Oct 31 '13 at 15:27
  • 1
    @Floris: The way I interpret it is that they want an *online* algorithm -- i.e. one where it's always possible to cheaply add a new, distinct sample later on. – j_random_hacker Oct 31 '13 at 15:28
  • 1
    @j_random_hacker: If you implement the set as a hash table you can get O(1) (at least for the expected time). – Jerry Coffin Oct 31 '13 at 15:45
  • @JerryCoffin: You're right, I was getting my n and N confused... but somehow only w.r.t. the hashtable, not the tree! – j_random_hacker Oct 31 '13 at 15:47
  • @j_random_hacker: Happens to the best of us (well, I guess I don't know about the best, but I've seen it happen to a lot better than me, anyway). – Jerry Coffin Oct 31 '13 at 15:50

1 Answers1

0

You can essentially do the same as for very small n, but just make that check more efficient. For example the naïve method of checking if you've already generated a number is to just linearly search the list of previously generated values. For an unknown n you could keep the set of previously generated values sorted so that you can use a more efficient search for identifying duplicates. With the naïve approach the algorithm takes O(n2) time, but a smarter search through previous results can reduce that to O(n*log2 n).

bames53
  • 86,085
  • 15
  • 179
  • 244
  • Inserting a value into a sorted array is O(n) time though. And if n is a large enough fraction of N that duplicates turn up often then an exponential term appears in the time complexity to account for the time spent rerunning it, and will dominate it. – j_random_hacker Oct 31 '13 at 15:28
  • @j_random_hacker: The array doesn't need to be sorted. It could just as easily be a hash table. – rici Oct 31 '13 at 15:31
  • @j_random_hacker so don't use an array. A tree can have O(log n) searching and insertion and is easy to keep sorted. – bames53 Oct 31 '13 at 15:31
  • @bames53: I would suggest changing your answer to say that, but I don't think it addresses the main problem I identified -- the exponential time complexity that results when already-selected numbers become sufficiently dense. – j_random_hacker Oct 31 '13 at 15:34
  • 1
    @j_random_hacker: You can exponentially rehash a hash table resulting in amortized linear insertion cost, just like a vector. Random numbers are *ideal* as hash-table keys, too. – rici Oct 31 '13 at 15:41
  • @rici: Sorry, got my n and N confused... Serious caffeine shortage here. Comment deleted. – j_random_hacker Oct 31 '13 at 15:43