6

Lets say I generate some input numpy array data using a np.random.normal() in my test_func.py script that is using pytest.

Now I want to call the func.py function that I am testing. How am I able to get testable results? If I set a seed in the test_func.py script, it isn't going to correspond to the random data that gets generated in the func.py function, correct?

I want to be able to create some reference data in test_func.py and then test that the randomness generated in the func.py script is comparable to the reference data I created (hence, testing the randomness and functionality of the func.py function).

Thank you!

EDIT: Here is some sample code to describe my process:

# func.py
import numpy as np
# I send in a numpy array signal, generate noise, and append noise to signal
def generate_random_noise(signal):
    noise = np.random.normal(0, 5, signal.shape)
    signal_w_noise = signal + noise
    return signal_w_noise


# test_func.py
import pytest
import numpy as np
import func
def test_generate_random_noise():
    # create reference signal
    # ...
    np.random.seed(5)
    reference_noise = np.random.normal(0, 5, ref_signal.shape)
    ref_signal_w_noise = ref_signal + reference_noise

    # assert manually created signal and noise and 
    assert all(np.array_equal(x, y) for x, y in zip(generate_random_noise(reference_signal), ref_signal_w_noise))
Coldchain9
  • 1,373
  • 11
  • 31
  • Don't invent unreproducible randomized data generation. Use [`hypothesis`](https://pypi.org/project/hypothesis/), esp. for `numpy` check out [Hypothesis for the Scientific Stack](https://hypothesis.readthedocs.io/en/latest/numpy.html). – hoefling Jul 06 '20 at 23:03

1 Answers1

3

When using random stuff you can have 2 test approach:

  • Use a known seed to ensure the function depending on the random distribution performs as expected: using a known seed you can compare function behavior to a known in advance behavior.
  • Validate the statistical behavior of the function depending on the random distribution. Here you need to do some maths on the expected distribution of the function "results" and have some statistical metrics used as success/fail criteria eg are the mean, skew,... of the tested function matching their expected ojective. This can be done using a non frozen seed but a lot of functions calls need to be collected to have sufficient data to have meaningfull statistics.
Jean-Marc Volle
  • 3,113
  • 1
  • 16
  • 20
  • 2
    Regarding your first bullet point. How do I get the same seed result from a function that is located in another script? I'll post some sample code in my original answer to see if that makes sense. I'm not quite sure how to set up my code to make it work. – Coldchain9 Jul 06 '20 at 14:07
  • You most probably need to modify your function so that it can be tested using a test provided seed. You could also create a singleton object that "generates" a seed that is either "pure random" or known in advance and have the test code configure this `Seed` so that it generates a known in advance seed or a random one. (The tested function would not be aware of this configuration). You can also have a look at dependency injection in python here: https://stackoverflow.com/questions/31678827/what-is-a-pythonic-way-for-dependency-injection – Jean-Marc Volle Jul 06 '20 at 14:12
  • I added some sample code of what I am trying to do. How would I do what you described, syntactically? – Coldchain9 Jul 06 '20 at 14:17