Unit testing handling of degraded network stack, file corruption, and other imperfections

Question

I'm primarily a C++ coder, and thus far, have managed without really writing tests for all of my code. I've decided this is a Bad Idea(tm), after adding new features that subtly broke old features, or, depending on how you wish to look at it, introduced some new "features" of their own.

But, unit testing seems to be an extremely brittle mechanism. You can test for something in "perfect" conditions, but you don't get to see how your code performs when stuff breaks. A for instance is a crawler, let's say it crawls a few specific sites, for data X. Do you simply save sample pages, test against those, and hope that the sites never change? This would work fine as regression tests, but, what sort of tests would you write to constantly check those sites live and let you know when the application isn't doing it's job because the site changed something, that now causes your application to crash? Wouldn't you want your test suite to monitor the intent of the code?

The above example is a bit contrived, and something I haven't run into (in case you haven't guessed). Let me pick something I have, though. How do you test an application will do its job in the face of a degraded network stack? That is, say you have a moderate amount of packet loss, for one reason or the other, and you have a function DoSomethingOverTheNetwork() which is supposed to degrade gracefully when the stack isn't performing as it's supposed to; but does it? The developer tests it personally by purposely setting up a gateway that drops packets to simulate a bad network when he first writes it. A few months later, someone checks in some code that modifies something subtly, so the degradation isn't detected in time, or, the application doesn't even recognize the degradation, this is never caught, because you can't run real world tests like this using unit tests, can you?

Further, how about file corruption? Let's say you're storing a list of servers in a file, and the checksum looks okay, but the data isn't really. You want the code to handle that, you write some code that you think does that. How do you test that it does exactly that for the life of the application? Can you?

Hence, brittleness. Unit tests seem to test the code only in perfect conditions(and this is promoted, with mock objects and such), not what they'll face in the wild. Don't get me wrong, I think unit tests are great, but a test suite composed only of them seems to be a smart way to introduce subtle bugs in your code while feeling overconfident about it's reliability.

How do I address the above situations? If unit tests aren't the answer, what is?

Edit: I see a lot of answers that say "just mock it". Well, you can't "just mock it", here's why: Taking my example of the degrading network stack, let's assume your function has a well defined NetworkInterface, which we'll mock. The application sends out packets over both TCP, and UDP. Now, let's say, hey, let's simulate 10% loss on the interface using a mock object, and see what happens. Your TCP connections increase their retry attempts, as well as increasing their back-off, all good practice. You decide to change X% of your UDP packets to actually make a TCP connection, lossy interface, we want to be able to be able to guarantee delivery of some packets, and the others shouldn't lose too much. Works great. Meanwhile, in the real world.. when you increase the number of TCP connections (or, data over TCP), on a connection that's lossy enough, you'll end up increasing your UDP packet loss, as your TCP connections will end up re-sending their data more and more and/or reducing their window, causing your 10% packet loss to actually be more like 90% UDP packet loss now. Whoopsie.

No biggie, let's break that up into UDPInterface, and TCPInterface. Wait a minute.. those are interdependent, testing 10% UDP loss and 10% TCP loss is no different than the above.

So, the issue is now you're not simply unit testing your code, you're introducing your assumptions into the way the operating system's TCP stack works. And, that's a Bad Idea(tm). A much worse idea than just avoiding this entire fiasco.

At some point, you're going to have to create a Mock OS, which behaves exactly like your real OS, except, is testable. That doesn't seem like a nice way forward.

This is stuff we've experienced, I'm sure others can add their experiences too.

I hope someone will tell me I'm very wrong, and point out why!

Thanks!

+1 because I'm interested, but one point. Unit tests aren't *the* answer. Nothing is *the* answer. There is no silver bullet which will solve all your testing dilemas. Each is only one tool of many, and you will always need to employ many to get maximum coverage. — John Dibling, Dec 29 '10 at 15:49
You are right testing is much more difficult and intellectual process than unit test software vendors try to present with their trivial examples. As Steve McConnell wrote in his book, instead of testing more, write better code. — Gene Bushuyev, Dec 29 '10 at 15:59
I completely concur with John Dibling. Unit tests will not make your software perfect. It is very helpful when used appropriately. Don't waste time on tests that aren't likely to be helpful in the real world. — Jay, Dec 29 '10 at 16:07
@Kim: There are many. As many as you can imagine, and 42 more. A few big ones are: scripted testing, monkey testing, alpha testing. There is an entire industry that does nothing but develop & sell some very expensive testing suites. — John Dibling, Dec 29 '10 at 16:30
@John Dibling: Bad question. What I should've asked: What are the methodologies that can automatically test failure cases like the above? (Updated my question a bit, btw). — Kim Sun-wu, Dec 29 '10 at 16:31
@Kim: For testing for ripple effects, I often like to run the old and new versions side-by-side, and check the results for differences. This checking can be automated. — John Dibling, Dec 29 '10 at 16:35
@Kim: For testing for network failures, I have set up the autobuilder to ship the app to a burn-in server where there are scripts that run the new app, and other programs (custom built) which create the failure conditions like network failures. — John Dibling, Dec 29 '10 at 16:36
@Kim: The file corruption testing can also be done in a similar way. All of these things can be tested during development by simply maintaining a library of files & scripts that simulate the desired effects. A bad file, a program which corrupts the network stack, etc. — John Dibling, Dec 29 '10 at 16:37
When I initially read the title I thought it said "How do you unit test the world?" — Amir Afghani, Dec 29 '10 at 17:45
In regards to your updated comment, at my workplace we actually *do* mock the OS, so it's not as ridiculous as it sounds. Of course we don't mock the *entire* OS... just the parts that our application is sensitive to and which are likely to be the source of errors. — JSBձոգչ, Dec 29 '10 at 21:25

score 14 · Answer 1 · answered Dec 29 '10 at 16:00

14

You start by talking about unit tests, then talk about entire applications; it seems you are a little confused about what unit testing is. Unit testing by definition is about testing at the most fine grained level, when each "unit" of the software is being tested. In common use, a "unit" is an individual function, not an entire application. Contemporary programming style has short functions, each of which does one well defined thing, which is therefore easy to unit test.

answered Dec 29 '10 at 16:00

Raedwald

46,613
43
151
237

It's not testing *my* code that I'm concerned about. I know *my* code will work given my *assumptions* are met. It's testing the assumptions that my code depends upon, and the subtle interactions therein that worries me greatly. – Kim Sun-wu Dec 29 '10 at 16:24
9

@Kim Sun-wu - then you're just using the wrong words. That's more like *integration* testing. – Daniel Earwicker Dec 29 '10 at 17:37
1

@Daniel Earwicker No, it's not integration testing, that's still much too low. Where the problem is that the outside world changes it cannot even be covered by the specification, so it's more like validation: "validation ensures that the product actually meets the user's needs" http://en.wikipedia.org/wiki/Verification_and_Validation_(software) – starblue Dec 31 '10 at 11:11

score 12 · Answer 2 · answered Dec 29 '10 at 16:17

what sort of tests would you write to constantly check those sites live?

UnitTests target small sections of code you write. UnitTests do not confirm that things are ok in the world. You should instead define application behavior for those imperfect scenarios. Then you can UnitTest your application in those imperfect scenarios.

for instance a crawler

A crawler is a large body of code you might write. It has some different parts, one part might fetch a webpage. Another part might analyze html. Even these parts may be too large to write a unit test against.

How do you test an application will do its job in the face of a degraded network stack? The developer tests it personally by purposely setting up a gateway that drops packets to simulate a bad network when he first writes it.

If a test uses the network, it's not a UnitTest.

A UnitTest (which must target your code) cannot call the network. You didn't write the network. The UnitTest should involve a mock network with simulated (but consistent each time) packet loss.

Unit tests seem to test the code only in perfect conditions

UnitTests test your code in defined conditions. If you're only capable of defining perfect conditions, your statement is true. If you're capable of defining imperfect conditions, your statement is false.

Daniel Earwicker · Accepted Answer · 2010-12-29T17:35:53.603

Work through any decent book on unit testing - you'll find that it's normal practise to write tests that do indeed cover edge cases where the input is not ideal or is plain wrong.

The most common approach in languages with exception handling is a "should throw" specification, where a certain test is expected to cause a specific exception type to be thrown. If it doesn't throw an exception, the test fails.

Update

In your update you describe complex timing-sensitive interactions. Unit testing simply doesn't help at all there. No need to introduce networking: just think of trying to write a simple thread safe queue class, perhaps on a platform with some new concurrency primitives. Test it on an 8 core system... does it work? You simply can't know that for sure by testing it. There are just too many different ways that the timing can cause operations to overlap between the cores. Depending on luck, it could take weeks of continuous execution before some really unlikely coincidence occurs. The only way to get such things right is through careful analysis (static checking tools can help). It's likely that most concurrent software has some rarely occuring bugs in it, including all operating systems.

Returning to the cases that can actually be tested, I've found integration tests to be often just as useful as unit tests. This can be as elaborate as automating the installation of your product, adding configurations to it (such as your users might create) and then "poking" it from the outside, e.g. automating your UI. This finds a whole other class of issue separate from unit testing.

Finglas · Answer 4 · 2010-12-29T16:38:12.240

It sounds as if you answered your own question.

Mocks/stubs are the key to testing difficult to test areas. For all of your examples, the manual approach of say creating a website with dodgy data, or causing network failure could be done manually. However it would be very difficult and tedious to do so, not something anyone would recommend. In fact, doing some would mean you are not actually unit testing.

Instead you'd use mock/stubs to pretend such scenarios have happened allowing you to test them. The benefit of using mocks is that unlike the manual approach you can guarantee that each time you run your tests the same procedure will be carried out. The tests in turn will be much faster and stable because of this.

Edit - With regards the updated question.

Just as a disclaimer my networking experience is very limited, therefore I can't comment on the technical side of your issues. However, I can comment on the fact you sound as if you are testing too much. In other words, your tests cover too much of a wide scope. I don't know what your code base is like but given functions/objects within that, you should still be able to provide fake input that will allow you to test that your objects/functions do the right thing in isolation.

So lets imagine your isolated areas work fine given the requirements. Just because your unit tests pass does not mean you've tested your application. You'll still need to manually test such scenarios you describe. In this scenario it sounds as if stress testing - limiting network resources and so on are required. If your application works as expected - great. If not, you've got missing tests. Unit testing (more in tie with TDD/BDD) is about ensuring small, isolated areas of your application work. You still need integration/manual/regression etc.. testing afterwards. Therefore you should use mocks/stubs to test your small, isolated areas function. Unit testing is more akin to a design process if anything in my opinion.

Stubs won't get you far. Nontrivial classes can only be reasonably tested in appropriate context. — Gene Bushuyev, Dec 29 '10 at 16:00
I've updated my question as to why I think this is a *horrible* idea. Please tell me I'm wrong. — Kim Sun-wu, Dec 29 '10 at 16:23

dietbuddha · Answer 5 · 2010-12-30T08:08:51.387

Integration Testing vs Unit Testing

I should preface this answer by saying I am biased towards integration tests vs unit tests as the primary type of test used in tdd. At work we also have some unit tests mixed in, but only as necessary. The primary reason why we start with an integration test is because we care more about what the application is doing rather than what a particular function does. We also get integration coverage which has been, in my experience, a huge gap for automated testing.

To Mock or Not, Why Not Do Both

Our integration tests can run either fully wired (to unmanaged resource) or with mocks. We have found that helps to cover the gap between real world vs mocks. This also provides us with the option to decide NOT to have a mocked version because the ROI for implementing the mock isn't worth it. You may ask why use mocks at all.

tests suite runs faster
guaranteed same response every time (no timeouts, unforeseen degraded network, etc)
fine-grained control over behavior

Sometimes you shouldn't write a test

Testing, any kind of testing has trade offs. You look at the cost to implement the test, the mock, variant tests, etc and weigh that against the benefits and sometime it doesn't make sense to write the test, the mock, or the variant. This decision is also made within the context of the kind of software your building which really is one of the major factor in deciding how deep and broad your test suite needs to be. To put it another way, I'll write a few tests for the social bacon meetup feature, but I'm not going to write the formal verification test for the bacon-friend algorithm.

Do you simply save sample pages, test against those, and hope that the sites never change?

Testing is not a panacea

Yes, you save samples (as fixtures). You don't hope the page doesn't change, but you can't know how and when it will change. If you have ideas or parameters of how it may change then you can create variants to make sure your code will handle those variants. When and if it does change, and it breaks, you add new samples, fix the problems and move on.

what sort of tests would you write to constantly check those sites live and let you know when the application isn't doing it's job because the site changed something, that now causes your application to crash?

Testing != Monitoring

Tests are tests and part of development (and QA), not for production. MONITORING is what you use in production to make sure your application is working properly. You can write monitors which should alert you when something is broken. That's a whole other topic.

How do you test an application will do its job in the face of a degraded network stack?

Bacon

If it were me I would have a wired and mocked mode for the test (assuming the mock was good enough to be useful). If the mock is difficult to get right, or if it's not worth it then I would just have the wired test. However, I have found that there is almost always a way split the variables in play into different tests. Then each of those tests are targeted to testing that vector of change, while minimizing all the other variability in play. The trick is to write the important variants, not every possible variant.

Further, how about file corruption?

How Much Testing

You mention that checksum being correct, but the file actually being corrupt. The question here is what is the class of software I'm writing. Do I need to be super paranoid about the possibility of a statistically small false positive or not. If I do, then we work to find what how deep and broad to test.

score 2 · Answer 6 · answered Dec 29 '10 at 15:57

I think you can't and shouldn't make an unit test for all possible errors you might face (what if a meteorite hits the db server?) - you should make an effort to test errors with reasonably probablity and/or rely or another services. For example; if your application requires the correct arrival of network packets; you should use the TCP transport layer: it guarantees the correctness of the received packets transparently, so you only have to concentrace eg. what happens if network connection is dropped. Checksums are designed to detect or correct a reasonable amount of errors - if you expect 10 errors per file, you would use different checksum than if you expect 100 errors. If the chosen checksum indicates that the file is correct, than you have no reason to think it's broken (the probablity that it is broken is negligible). Because you don't have infinite resources (eg. time) you have to make compromises when you write your tests; and choosing these compromises it a tough question.

score 2 · Answer 7 · answered Dec 29 '10 at 16:45

Although not a complete answer to the massive dilema you face, you can reduce the amount of tests by using a technique called Equivalence Partitioning.

In my organization, we perform many levels of coverage, regression, positive, negative, scenario based, UI in automated and manual tests, all starting from a 'clean environment', but even that isn't perfect.

As for one of the cases you mention, where a programmer comes in and changes some sensitive detection code and no one notices, we would have had a snapshot of data that is 'behaviourally dodgy', which fails consistently with a specific test to test the detection routine - and we would run all tests regularly (and not just at the last minute).

Kristopher Johnson · Answer 8 · 2010-12-29T15:58:03.683

Sometimes I'll create two (or more) test suites. One suite uses mocks/stubs and only tests the code I'm writing. The other tests test the database, web sites, network devices, other servers, and whatever else is outside of my control.

Those other tests are really tests of my assumptions about the systems my code interacts with. So if they fail, I know my requirements have changed. I can then update my internal tests to reflect whatever new behavior my code needs to have.

The internal tests include tests that simulate various failures of the external systems. Whenever I observe a new kind of failure, either through my other tests or as a result of a bug report, I have a new internal test to write.

Writing tests that model all the bizarre things that happen in the real world can be challenging, but the result is that you really think about all those cases, and produce robust code.

That's an excellent way to put it, the "assumptions about the systems my code interacts with". — Kim Sun-wu, Dec 29 '10 at 15:52

Zac Howland · Answer 9 · 2010-12-29T18:25:18.860

1

The proper use of Unit Testing starts from the ground up. That is, you write your unit tests BEFORE you write your production code. The unit tests are then forced to consider error conditions, pre-conditions, post-conditions, etc. Once you write your production code (and the unit tests are able to compile and run successfully), if someone makes a change to the code that changes any of its conditions (even subtly), the unit test will fail and you will learn about it very quickly (either via compiler error or via a failed unit test).

EDIT: Regarding the updated question

What you are trying to test is not really suited well for unit testing. Networking and database connections test better in a simulated integration test. There are far too many things that can break during the initialization of a remote connection to create a useful unit test for it (I'm sure there are some unit-tests-fix-all people that will disagree with me there, but in my experience, trying to unit test network traffic and/or remote database functionality is worse than shoving a square peg though a round hole).

edited Dec 29 '10 at 18:25

answered Dec 29 '10 at 15:52

Zac Howland

15,777
1
26
42

3

It doesn't start with tests, it starts with interface design, without which you wouldn't know what to test. The difficult part is that interfaces evolve during design and implementation, so test harness is never completed before the design is settled and a reasonable implementation is in place. – Gene Bushuyev Dec 29 '10 at 16:06
@Gene: Unit Test design usually leads to interface design. If your interfaces are evolving rapidly during implementation, you likely did not think about how the interface would be used during the design of the unit tests (or have changed how you intend the interface to be used - which does happen from time to time). The point is, you start writing your tests which will NOT compile due to the interface and implementation not existing yet. The tests will not pass until they exist and meet your pre- and post- conditions. – Zac Howland Dec 29 '10 at 16:12
1

@Gene - another way to look at it (and this a popular approach) is to "grow" the interfaces, tests and implementations all in parallel, starting with the simplest possible form of interface, a complete test of that interface and the simplest implementation that will satisfy the test. Then you keep modifying all three as necessary until your customer is happy. – Daniel Earwicker Dec 29 '10 at 17:39

score 1 · Answer 10 · answered Dec 29 '10 at 19:02

You are talking about library or application testing, which is not the same as unit testing. You can use unit testing libraries such as CppUnit/NUnit/JUnit for library and regression testing purposes, but as others have said, unit testing is about testing your lowest level functions, which are supposed to be very well defined and easily separated from the rest of the code. Sure, you could pass all low-level unit tests, and still have a network failure in the full system.

Library testing can be very difficult, because sometimes only a human can evaluate the output for correctness. Consider a vector graphics or font rendering library; there's no single perfect output, and you may get a completely different result based on the video card in your machine.

Or testing a PDF parser or a C++ compiler is dauntingly difficult, due to the enormous number of possible inputs. This is when owning 10 years of customer samples and defect history is way more valuable than the source code itself. Almost anyone can sit down and code it, but initially you won't have a way of validating your program for correctness.

score 0 · Answer 11 · answered Dec 29 '10 at 15:52

The beauty of mock objects is that you can have more than one. Assume that you are programming against a well-defined interface for a network stack. Then you can have a mock object WellBehavingNetworkStack to test the normal case and another mock object OddlyBehavingNetworkStack that simulates some of the network failures that you expect.

Using unit tests I usually also test argument validation (like ensuring that my code throws NullPointerExceptions), and this is easy in Java, but difficult in C++, since in the latter language you can hit undefined behavior quite easily, and then all bets are off. Therefore you cannot be strictly sure that your unit tests work, even if they seem to. But still you can test for odd situations that do not invoke undefined behavior, which should be quite a lot in well-written code.

Updated my question explaining why I think that's a bad idea. Please correct me if I'm wrong. — Kim Sun-wu, Dec 29 '10 at 16:30

k rey · Answer 12 · 2010-12-29T17:06:59.897

0

What you are talking about is making applications more robust. That is, you want them to handle failures elegantly. However, testing every possible real world failure scenario would be difficult if not impossible. The key to making applications robust is to assume that failure is normal and should be expected at some point in the future. How an application handles failure really depends on the situation. There are a number of different ways to detect and handle failure (maybe a good question to ask the group). Trying to rely on unit testing alone will only get you part of the way. Anticipating failure (even on some simple operations) will get you even closer to a more robust application. Amazon built thier entire system to anticipate all types of failures (hardware, software, memory and file corruption). Take a look at thier Dynamo for an example of real world error handling.

edited Dec 29 '10 at 17:06

answered Dec 29 '10 at 16:54

k rey

611
4
11

@k rey: While the goal is indeed to make a more robust application, this question isn't related to either detecting, or handling failure. To me, it's about how we can reliably, and automatically test for known failure conditions, and our handling of those, given an unpredictable stack. I hope that makes some sense. – Kim Sun-wu Dec 29 '10 at 17:01
You are not alone with this issue. Take a look at what Amazon did to handle this type of real world problem. – k rey Dec 29 '10 at 17:08
@k rey: Interestingly enough, Dynamo brings up a key point in this sort of thing. IIRC, Dynamo caused S3 to go down for an extended period of time last year, I believe. I believe a solution like Dynamo (and, any P2P solution, but I'm very biased here), is just introducing more subtle edge cases which will lead to even worse failures at some point. Jeff Dean forever! – Kim Sun-wu Dec 29 '10 at 17:12

Unit testing handling of degraded network stack, file corruption, and other imperfections

12 Answers12

Linked