11

I've started using googletest to implement tests and stumbled across this quote in the documentation regarding value-parameterized tests

  • You want to test your code over various inputs (a.k.a. data-driven testing). This feature is easy to abuse, so please exercise your good sense when doing it!

I think I'm indeed "abusing" the system when doing the following and would like to hear your input and opinions on this matter.

Assume we have the following code:

template<typename T>
struct SumMethod {
     T op(T x, T y) { return x + y; }   
};

// optimized function to handle different input array sizes 
// in the most efficient way
template<typename T, class Method> 
T f(T input[], int size) {
    Method m;
    T result = (T) 0;
    if(size <= 128) {
        // use m.op() to compute result etc.
        return result;
    }
    if(size <= 256) {
        // use m.op() to compute result etc.
        return result;
    }
    // ...
}

// naive and correct, but slow alternative implementation of f()
template<typename T, class Method>
T f_alt(T input[], int size);

Ok, so with this code, it certainly makes sense to test f() (by comparison with f_alt()) with different input array sizes of randomly generated data to test the correctness of branches. On top of that, I have several structs like SumMethod, MultiplyMethod, etc, so I'm running quite a large number of tests also for different types:

typedef MultiplyMethod<int> MultInt;
typedef SumMethod<int> SumInt;
typedef MultiplyMethod<float> MultFlt;
// ...
ASSERT(f<int, MultInt>(int_in, 128), f_alt<int, MultInt>(int_in, 128));
ASSERT(f<int, MultInt>(int_in, 256), f_alt<int, MultInt>(int_in, 256));
// ...
ASSERT(f<int, SumInt>(int_in, 128), f_alt<int, SumInt>(int_in, 128));
ASSERT(f<int, SumInt>(int_in, 256), f_alt<int, SumInt>(int_in, 256));
// ...
const float ep = 1e-6;
ASSERT_NEAR(f<float, MultFlt>(flt_in, 128), f_alt<float, MultFlt>(flt_in, 128), ep);
ASSERT_NEAR(f<float, MultFlt>(flt_in, 256), f_alt<float, MultFlt>(flt_in, 256), ep);
// ...

Now of course my question is: does this make any sense and why would this be bad?

In fact, I have found a "bug" when running tests with floats where f() and f_alt() would give different values with SumMethod due to rounding, which I could improve by presorting the input array etc.. From this experience I consider this actually somewhat good practice.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
bbtrb
  • 4,065
  • 2
  • 25
  • 30

3 Answers3

11

I think the main problem is testing with "randomly generated data". It is not clear from your question whether this data is re-generated each time your test harness is run. If it is, then your test results are not reproducible. If some test fails, it should fail every time you run it, not once in a blue moon, upon some weird random test data combination.

So in my opinion you should pre-generate your test data and keep it as a part of your test suite. You also need to ensure that the dataset is large enough and diverse enough to offer sufficient code coverage.

Moreover, As Ben Voigt commented below, testing with random data only is not enough. You need to identify corner cases in your algorithms and test them separately, with data tailored specifically for these cases. However, in my opinion, additional testing with random data is also beneficial when/if you are not sure that you know all your corner cases. You may hit them by chance using random data.

haimg
  • 4,547
  • 35
  • 47
  • 2
    Randomly generated data is bad for two reasons -- first, because as you mentioned, the tests aren't reproducable. And secondly, because corner cases may not be covered by randomly generated data. Saving the random vectors does nothing for the second drawback. – Ben Voigt Oct 08 '11 at 18:13
  • @haimg - If you a doing black box testing, how do you know the algorithm used and its corner cases? :-) – Bo Persson Oct 10 '11 at 21:10
  • Well maybe I'm misreading the original question, but there is nothing there that says that the testing **must** not use the knowledge of the implementation. Moreover, http://en.wikipedia.org/wiki/Black_box_testing specifically says that "Elements at the edge of the domain are selected and tested", which essentially is testing corner cases. – haimg Oct 10 '11 at 21:21
  • +1 for "additional testing with random data is also beneficial when/if you are not sure that you know all your corner cases" we recently had a problem with an unidentified corner case, and I added some (pseudo) random test cases to try to detect any additional unidentifed corner cases. – Raedwald Oct 13 '11 at 12:08
  • 3
    There is one simple solution to random data testing: make sure all your randomness is generated by the same seedable generator and store the seed for that generator. Random testing can be a great tool, but just like any other tool you have to know how to use it and what are the benefits and weaknesses. – KillianDS Oct 14 '11 at 13:28
  • Thank you for your extended answer and the discussion in the comments. Especially the remarks with respect to having a fixed data set for reproducible tests and separate tests for corner cases. I guess one could consider the tests with varying input array size in my example that probe the different branches also corner cases in some sense. – bbtrb Oct 14 '11 at 17:57
6

The problem is that you can't assert correctness on floats the same way you do ints.

Check correctness within a certain epsilon, which is a small difference between the calculated and expected values. That's the best you can do. This is true for all floating point numbers.

I think I'm indeed "abusing" the system when doing the following

Did you think this was bad before you read that article? Can you articulate what's bad about it?

You have to test this functionality sometime. You need data to do it. Where's the abuse?

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • Sure. I just forgot to put it correctly in the above example. Edited. Apart from that, I am more interested in the arguments against writing this kind of tests. – bbtrb Oct 06 '11 at 15:36
0

One of the reasons why it could be bad is that data driven tests are harder to maintain and in longer period of time it's easier to introduce bugs in tests itself. For details look here: http://googletesting.blogspot.com/2008/09/tott-data-driven-traps.html

Also from my point of view unittests are the most useful when you are doing serious refactoring and you are not sure if you didn't changed the logic in wrong way. If your random-data test will fail after that kind of changes, then you can guess: is it because of data or because of your changes?

However, I think it could be useful (same as stress tests which also are not 100% reproducible). But if you are using some continuous integration system, I'm not sure if data-driven tests with huge amount of random generated data should be included into it. I would rather make separate deployment which periodically make a lot of random tests at once (so the chance of discovering something bad should be quite high every time when you run it). But it's too resource heavy as the part of normal tests suite.

Piotr Kukielka
  • 3,792
  • 3
  • 32
  • 40