-3

I need to generate a random string 8 digit long and next time when I generate the random number make sure it was not already generated. If we simply think I have to store every confirmation number generated and every time I try to generate a new one, verify that it was not already generated by looking at the stored values.

In my case the numbers are stored in database.

But if the already generated numbers are a lot, it can take significant time to validate that. What is the best way to approach this problem? This should work something like youtube video Id. It probably never happens that by chance youtube video id generator generates same video id twice. So they are definitely verifying that each time they generate a new videoid that it wasn't already generated and used for some other video. There can be a lot of them.

By feasible amount of time I mean something that would be useful to use from the website. If for a user of a website, I am asking certain information and creating a unique identifier for him/her, then I don't want the website to be slow because I am verifying that the randomly generated identifier doesn't already exist in the database.

Hem Acharya
  • 258
  • 3
  • 15
  • 2
    Not already generated in this run of the program on this machine? On this machine, at any time, by any instance of the program? Anywhere in the world by any instance of the program? Is a feasible amount of time a millisecond, a year, a century? This question is far, far too vague to answer. – Eric Lippert Mar 28 '17 at 21:18
  • 1
    Is your number required to be random for security reasons? Does it actually need to be *random*? What properties must the number actually have, aside from its length? Could you for instance use the multiplicative inverse technique to generate unique numbers by mapping 1, 2, 3, ... onto "random looking" numbers? https://ericlippert.com/2013/11/14/a-practical-use-of-multiplicative-inverses/ – Eric Lippert Mar 28 '17 at 21:21
  • Also, does it need to be a string, or can it be an int from 0 to 99,999,999? You refer to the string as a number. Using `int` will take up way less memory. – JamesFaix Mar 28 '17 at 21:24
  • @Eric I am elaborated the question to clarify my intent. Let me know if it is still not clear – Hem Acharya Mar 29 '17 at 01:59
  • what percentage of the hundred million possibilities do you think youll use up in the long run? – Eric Lippert Mar 29 '17 at 06:19
  • The number can be used up quickly. Idea is to treat it as unique until some point and then add additional parameters like date after that to verify that it is unique for those parameters. I can't exactly tell you the number but it can reach few millions in less than a year. – Hem Acharya Mar 29 '17 at 15:55

1 Answers1

-1

If you are using a random generator, and you need to check to make sure each number/string is unique, you can't just pick and check for uniqueness in a loop. This will always take longer the more numbers you pick, and there is no reason that you can't just keep randomly picking 1 forever.

The only way to really get around the problem of uniqueness checks taking arbitrarily long is to create a shuffled list of all possible results and then removing one at a time. You can generate the list in sequence and then use the Fisher Yates Shuffle to shuffle the list relatively efficiently.

The list of all 8 digits strings is quite long, but that is the safe way, and it will get smaller as you go on.

JamesFaix
  • 8,050
  • 9
  • 37
  • 73
  • 1
    The *only* way? Do you suppose that this is how GUIDs are guaranteed to be unique? – Eric Lippert Mar 28 '17 at 23:50
  • *“there is no reason that you can't just keep randomly picking 1 forever”* There is: probability. – Ry- Mar 29 '17 at 00:17
  • Man, you people are really picky about wording. By "no reason" I meant the process is not deterministic and can continue for an arbitrarily long period of time. – JamesFaix Mar 29 '17 at 12:50
  • @Ryan: "a good random number generator will also produce sequences that look nonrandom to the human eye and which also fail any statistical tests that we might expose it to" - from [RANDOM.ORG - Statistical Analysis](https://www.random.org/analysis/) – Luca Cremonesi Mar 29 '17 at 16:39
  • @LucaCremonesi: Not sure what you mean by this. – Ry- Mar 29 '17 at 16:40
  • @Ryan: I meant that an arbitrarily long sequence of the same repeating number is perfectly possible, and, although improbable, it has the exactly same probability of any other sequence of the same length (assuming a uniform distribution). – Luca Cremonesi Mar 29 '17 at 16:47
  • @LucaCremonesi: “exactly same probability of any other sequence of the same length” is irrelevant; the important part is “although improbable”. The longer the sequence, the less probable. “forever” is pretty long and therefore pretty improbable, to the point you don’t need to worry about it at all. (Lots of cryptography depends on things being just pretty improbable.) – Ry- Mar 29 '17 at 16:49
  • @Ryan: In the case of trying to naively generate random unique numbers over time, the probability of a duplicate number grows over time to 1. This is what is of importance here. – JamesFaix Mar 29 '17 at 17:58
  • @JamesFaix: Right, so why not just delete the incorrect “there is no reason that you can't just keep randomly picking 1 forever” and leave the correct “This will always take longer the more numbers you pick”? – Ry- Mar 30 '17 at 01:08