2

My goal is to implement a test for a programming exercise. The exercise requires a method for generating random combinations of two uppercase letters and three digits (in that order, eg: QF354). The combinations must be random and non repeating.

How can I test if the method was implemented right?

I was thinking about generating a group of samples and to check for any repeated value, but the number of permutations is too large (p = 26*26*10*10*10 = 676000) for quick tests.

I don't know how big my group of samples should be to affirm (at least with high confidence) that the values are non-repeating. Alternatives are also welcome.

cdpaiva
  • 93
  • 1
  • 7
  • You can apply the same technique as here https://stackoverflow.com/questions/2130621/how-to-test-a-random-generator – Arthur Klezovich Nov 07 '21 at 12:26
  • 1
    @ArthurKlezovich That doesn’t seem to test the non-repeat? – Ole V.V. Nov 07 '21 at 12:47
  • 1
    How sure do you need to be? Look at [Birthday problem](https://en.wikipedia.org/wiki/Birthday_problem) and make your own calculations of the relation between sample size and probability. – Ole V.V. Nov 07 '21 at 12:49
  • 2
    @OleV.V. I coauthored a paper back in the 1980's applying that concept to construct an easy test of independence for PRNGs which output their entire state (which was almost all PRNGs in the 1980's). If there are `poolsize` possible values and you haven't seen any duplicates by `3*sqrt(poolsize)`, the probability of non-independence is > 0.99. This can't be flipped to check that there won't be duplicates, though, because there could be cycles with `cycle_length > 3*sqrt(poolsize)`. The calculations you refer to assume independent and uniformly distributed outcomes. – pjs Nov 07 '21 at 19:30
  • How many of the 676000 possibilities are you actually going to generate? – pjs Nov 08 '21 at 14:50

1 Answers1

0

@OleV.V. The material you suggested was enough for me to find a solution.

This was the approximation I used:

p(n,d) = 1 - e^(-(n²/2d);

Where n is the population size and d the sample size.

For my problem, the probability of collisions asymptotically reaches 1 at about 5000 samples. I did some tests with a few different solutions and so far all the results are as expected.

Thanks for the comments!

cdpaiva
  • 93
  • 1
  • 7
  • 2
    While your answer might be correct, it cannot be read in night mode since your formulas are here black on dark grey. Please edit! – Reinhard Männer Nov 08 '21 at 22:09
  • Ty, just edited it. – cdpaiva Nov 08 '21 at 22:49
  • My concern would be whether you can prove that your method achieves the full cycle of 676000. If your algorithm yields two subcycles of of length 338000, or four of length 169000, or..., then you could be in for a nasty surprise long before you hit 676000. Even if the math is correct, you've got a problem if the underlying assumption of a full cycle is wrong. That's part of why I asked about how many values you actually need in a comment on the question – pjs Nov 09 '21 at 02:57