How to write a test case to check if numbers in an array are unique when the array is created randomly?

Question

I'm writing a specification for a lottery function that should create an array of seven (random) numbers between 1 and 49. The numbers in the array must be unique, a result like [1, 2, 3, 4, 5, 6, 6] would be invalid.

How can I make sure that the test is not green by chance?

One of my tests checks if the numbers in the array are unique. Now it seems like I have a "flaky" spec, because it can be red or green (randomly).

I "solved" the problem by just running the test case multiple times but I'm wondering if there is a more elegant solution. I want a secure way to check whether lottery() is reliably avoiding duplicates.

The test case (with test repetition):

it('TC05 - lottery() each array item must be a unique number', () => {

    // function to check if an array has duplicates
    const hasDuplicates = arr => {
        return (new Set(arr)).size !== arr.length;
    }

    // run check for duplicates x times
    const REPETITIONS = 32
    let seriesOfTests = []
    for (let i = 0; i < REPETITIONS; i++) {
        if (hasDuplicates(lottery())) {
            seriesOfTests.push(true)
        } else {
            seriesOfTests.push(false)
        }
    }

    // check series to ensure each single test passed
    let seriesSummery = seriesOfTests.includes(true)

    // test case
    expect(seriesSummery).must.be.false()
})

The lottery function (with if statement to avoid duplicates):

// lottery returns an array consists of 7 random numbers between 1 and 49
const lottery = () => {
    let result = []

    for (let i = 0; i < 7; i++) {
        const newNbr = randomNbr(49)

        if (result.find(el => el === newNbr) === undefined) {
            result.push(newNbr)
        } else {
            i -= 1
        }
    }
    return result
}

just fill an array and shuffle it [Sampling a random subset from an array](https://stackoverflow.com/questions/11935175/sampling-a-random-subset-from-an-array) — pilchard, Jul 28 '23 at 11:23
@pilchard that's an _implementation_, but the question about the _test_ would still remain. — jonrsharpe, Jul 28 '23 at 11:24
@jonrsharpe fair enough, but by improving implementation one simplifies testing. A shuffled subset of a known unique array will always be unique — pilchard, Jul 28 '23 at 11:24
@pilchard not really, you test the _behaviour_ not the implementation. Good tests would let you confidently switch between the current implementation and your suggestion. — jonrsharpe, Jul 28 '23 at 11:25
*'... I want a secure way to check whether the if statement in lottery() is reliable avoiding duplicates?'* is a test of a specific implementation, one shouldn't need to know the internals to test the result. — pilchard, Jul 28 '23 at 11:27
@pilchard please answer my question from the spec/testing perspective. I'm pretty sure my lottery function is solid but I'm having a hard time defining a solid test case. — spacePirate, Jul 28 '23 at 11:30
[Here's](https://github.com/textbook/rps-api/blob/9dc5ae162d90397ad57702b81c4013611c8afcbc/src/rpsService.test.js#L65-L79) an example, from https://blog.jonrshar.pe/2021/Apr/10/js-tdd-api.html. It tests the _properties_ of the code, not the implementation. Yes, it's feasible that a "wrong" implementation could pass those tests, but it's about _confidence_ in the implementation. Your properties would be e.g. "seven numbers" and "all unique". — jonrsharpe, Jul 28 '23 at 11:31
Thanks @jonrsharpe . So your answer to the problem would also be repetition, right? — spacePirate, Jul 28 '23 at 11:36
I think the answer to your question is that tests cannot check the generic case. Only specific cases. So there is no way to test it always works without testing *every* specific case. — Rocky Sims, Jul 28 '23 at 11:38
That's what [property testing](https://en.wikipedia.org/wiki/Software_testing#Property_testing) _is_. If you want something more structured, try https://fast-check.dev/. — jonrsharpe, Jul 28 '23 at 11:47
`i -= 1` eek, altering the for index seems wrong to me, it might be ok, but I would assume it gives the V8 optimiser some funky branches to work on. I would say a while loop would feel less hacky. — Keith, Jul 28 '23 at 11:50
`if (result.find(el => el === newNbr) === undefined)` is a complicated way to write `if (! result.includes(newNbr))`. It runs faster and clearly describes the intention. Read about [`Array.includes()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/includes). — axiac, Jul 28 '23 at 11:51
Or just do it with a Set, but the point is once you have tests in place you are free to refactor. — jonrsharpe, Jul 28 '23 at 12:02
Why does one not [implement in first place the function in a way that it creates an array of 7 unique/distinct integers out of the 1 to 49 range](https://stackoverflow.com/questions/69649671/create-a-new-array-from-a-random-generator/69651414#69651414)? And the check for such a requirement/result would be `(new Set(randomUniqueIntArray)].size === randomUniqueIntArray.length)` or `(new Set(randomUniqueIntArray)].size === 7)`. — Peter Seliger, Jul 28 '23 at 13:40
@PeterSeliger because the idea of TDD is to describe a specification/behavior regardless of the implementation. You can't just say I've made sure on the implementation side that this will not happen. — spacePirate, Jul 28 '23 at 14:38
@spacePirate ... I'm totally aware of that ... then just go with the suggested uniqueness comparison. It of cause needs to be accompanied by an additional check which assures just integer values in between 1 to 49 including ... `new Set(intArray)].size === 7 && Math.max(...intArray) <= 49 && Math.min(...intArray) >= 1`. — Peter Seliger, Jul 28 '23 at 14:50

score 0 · Answer 1 · answered Jul 28 '23 at 12:18

TDD + random is hard, and has been hard for a long time.

The only general approach that I've found satisfactory is to refactor the problem into three pieces

a deterministic function that takes a sequence of bits and computes a value
a source of random bits
a function that combines the two

The first function is something that we can drive with tests. The other two parts are coupled to something that is harder to constrain well, so we compensate by making certain that code is "so simple that there are obviously no deficiencies".

For your specific problem: there are only (49 choose 7) arrays that satisfy your constraint. So a function that "looks up" a constraint satisfying example by index, combined with an source of random values in range (49 choose 7) gives you what you need.

A convenient way to think about this is to imagine our collection of possible arrays is a sequence in "squashed order", and our (randomly generated) index is used to pull that specific solution out of the sequence.

Now, you probably don't want to have (49 choose 7) arrays just hanging around in memory, so what you really want is a generator function that produces the answer given an index in the acceptable range.

The good news: that generator function isn't too difficult to implement, once you have studied the concept. The explanation I learned from is here with another take on the same problem here.

score 0 · Answer 2 · answered Aug 13 '23 at 19:00

The only thing I can think of to test a random behavior is running the tests repeatedly until you have some confidence that it wouldn't fail. If I want to test a fair coin, I have to flip it large enough number of times to see that it is "almost" 50% heads and 50% tails.

In the same sense, if I want to ensure this array is unique, to run the check many times in each test run. That should increase our confidence, but not up to 100%. You could run the check 999,999,999, which only fails the 1,000,000,000th time, but that is highly unlikely.

Depending on the business use case, that very unlikely chance can be very expensive. Like, the uniqueness of the array might fail once every million years, but if it does, the company will not recover or may face legal consequences, etc. In that case, I'd suggest putting an assertion on the uniqueness of the array. Test to ensure this assertion will be triggered if the array is not unique. This will give you the production safety net if running the uniqueness test 1000 times on each run doesn't cover a corner case.

Finally, I'd suggest using a Set instead of an Array. Not only would it guarantee uniqueness, but also convey the intent better. What would be at play here is the length of the set.

The implementation should be less error-prone. We will add randomly generated values to the set, and when the set reaches the desired size, we're done. This is a more deterministic solution.

axiac · Answer 3 · 2023-07-28T12:25:18.390

You need to replace the function that generates the random number with a test double (also incorrectly known as mock). Configure the "mock", which is a test stub in fact, to return a certain sequence of numbers (they won't be random, you decide what values to return) then verify that the list of random numbers generated by function lottery() contains the numbers that you expect it to contain, given the "random" numbers generated by the test stub.

A quick example, using Jest:

// I assume that the function `randomNbr()` is exported by file `randomNbr.js`
import { randomNbr } from 'randomNbr';

jest.mock('randomNbr');

it('generates a list of unique values', () => {
  randomNbr.mockReturnValueOnce(7);
  randomNbr.mockReturnValueOnce(2);
  randomNbr.mockReturnValueOnce(1);
  randomNbr.mockReturnValueOnce(7);
  randomNbr.mockReturnValueOnce(6);
  randomNbr.mockReturnValueOnce(3);
  randomNbr.mockReturnValueOnce(3);
  randomNbr.mockReturnValueOnce(7);
  randomNbr.mockReturnValueOnce(2);
  randomNbr.mockReturnValueOnce(5);
  randomNbr.mockReturnValueOnce(9);
  randomNbr.mockReturnValueOnce(2);
  randomNbr.mockReturnValueOnce(8);
  // ... put here at least the number of unique values that `lottery()` returns

  const numbers = lottery();

  expect(numbers).toEqual([7, 2, 1, 6, 3, 5, 9]);
});

Given that the input is known (the values returned by randomNbr() on each call) and the algorithm requires to generate 7 random numbers, without repetitions, the expected output is also known.

After you have the test green you can refactor the lottery() function to improve its code.

Now it uses Array.find() instead of Array.includes() to find if the list contains a certain value. Array.includes() is faster and it expresses better the code intention. The reader can understand the code that uses Array.includes() without reading the documentation; they have to read the documentation to find out what Array.find() returns and when it returns undefined.

Also, the for loop and i = -1 are confusing and the for loop is even incorrect. If randomNbr() generates duplicate numbers, lottery() produces a list that contains more than 7 values. If randomNbr(), for some reason, gets stuck and always returns the same value (as it happens in the test above), lottery() never completes.

Let's try to write the code starting from the requirements. The lottery() function must:

return 7 values;
the values must be random;
the values must be distinct.

The third requirement tells us that, sometimes, it has to generate more than 7 random values. The for loop capped to 7 iterations is incorrect. Let's generate random distinct values until we have 7 of them.

// using `const` and arrow functions for anything else than inline callbacks
// is a stupid fashion that has many drawbacks and no benefit
function lottery() {
  // start with an empty list
  let result = []

  do {
    // generate a random number
    const nb = randomNbr(49);

    // check for duplicates
    if (result.includes(nb)) {
      // generated a duplicate, ignore it, restart the loop
      continue;
    }

    // it's good, keep it in the list
    result.push(nb)

    // stop when the list reaches the desired size
  } while (result.length < 7);

  return result
}

How to write a test case to check if numbers in an array are unique when the array is created randomly?

3 Answers3