0

Let's say I have a Python program that takes in ~40 user inputs and returns a prediction on their lifespan. The user inputs are mainly categorical or finite such as sex, smoking status, and birth year.

I want to maximize my test cases by testing all the acceptable values for each field, such as sex:['Male', 'Female', None]. Is there a good way to do this without using dozens of nested for loops? For example, an itertools function. I'm thinking of something like scikit-learn's grid search where you list the acceptable values and it initiates a hyperparameter optimization by checking all possible combinations

I want to avoid:

for sex in ['male', 'female', None]:
    for smoking_status in ['smoker', 'non-smoker', None]:
        for birth_year in [1900, ..., 2022, None]:
           assert(myfunc(sex, smoking_status, birth_year) == trueOutput)

Assume trueOutput is dynamic and always gives the correct value (I plan to cross-reference Python output to an Excel spreadsheet and rewrite the Excel inputs for every test case to get the updated output). I also plan to write every possible test case to JSON files that would represent a specific user's data so I can test the cases that failed

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Benjamin Luo
  • 131
  • 8

2 Answers2

2

You want to use itertools.product like so:

sex = ['m', 'f', None]
smoking = ['smoker', 'non-smoker', None]
birth = [1999, 2000, 2001, None]

for item in itertools.product(sex, smoking, birth):
    print(item)

to pass the arguments to your function, use the spreading operator:

sex = ['m', 'f', None]
smoking = ['smoker', 'non-smoker', None]
birth = [1999, 2000, 2001, None]

for item in itertools.product(sex, smoking, birth):
     assert myfunc(*item) == trueOutput
# or
for se, sm, b in itertools.product(sex, smoking, birth):
     assert myfunc(se, sm, b) == trueOutput
Schottky
  • 1,549
  • 1
  • 4
  • 19
1

itertools.product() is exactly what you want, but for additional functionality, collect each choice into a dict

category_inputs_mapping = {
    "gender":     ("male", "female", ..., None),
    "birth_year": sorted(2022 - x for x in range(150)),
    ...
}

Then create a function which generates each possibility for you

def gen_inputs_mapper(d: Dict):
    # opportunity to inspect dict
    for values in itertools.product(*d.values()):
        # opportunity to assert/inspect values
        yield {k: v for k, v in zip(d.keys(), values)}

and finally, get each possibility in turn along with its index

for index, possibility in enumerate(gen_inputs_mapper(category_inputs_mapping)):
    result = myfunc(**possibility)      # unpack dict to args
    assert result = true_output[index]  # nth index from computed sheet
ti7
  • 16,375
  • 6
  • 40
  • 68
  • Do you know how to randomize the combinations? It turns out there are trillions of test cases and I want to select `1000` at random – Benjamin Luo Mar 10 '22 at 16:58
  • 1
    certainly, but I think you want quite a different approach; use a formula like this with `random.sample(range(N), 1000)` https://stackoverflow.com/a/9944993/4541045 .. if the number is too large, collect the indicies from `.randrange()`, discarding duplicates – ti7 Mar 10 '22 at 17:10
  • 1
    @BenjaminLuo I think this would make a good second question (please link it back to here so it's easy to find) – ti7 Mar 10 '22 at 17:11
  • Apologies for the delay, there were issues with creating the question so I waited it out, here it is: https://stackoverflow.com/questions/71429254/python-select-subset-of-test-cases-from-all-possible-combinations – Benjamin Luo Mar 10 '22 at 18:47