7

When working on a large software project I often use fuzz-testing as part of my test cases to help smoke out bugs that may only show up when the input reaches a certain size or shape. I've done this most commonly by just using the standard random number facilities that are bundled with the programming language I happen to be using.

Recently I've started wondering, ignoring the advantages or disadvantages of fuzz testing in general, whether or not it's a good idea to be using non-cryptographically secure pseudorandom number generators when doing fuzz testing. Weak random number generators often exhibit patterns that distinguish them from true random sequences, even if those patterns are not readily obvious. It seems like a fuzz test using a weak PRNG might always fail to trigger certain latent bugs that only show up in certain circumstances because the pseudorandom numbers could be related to one another in a way that never trigger those circumstances.

Is it inherently unwise to use a weak PRNG for fuzz testing? If it is theoretically unsound to do so, is it still reasonable in practice?

GEOCHET
  • 21,119
  • 15
  • 74
  • 98
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065

3 Answers3

6

You're confusing two very different grades of "weakness":

  • statistical weakness means the output of the PRNG exhibits statistical patterns, such as having certain sequences occur more often than others. This could actually lead to ineffective fuzz testing in some rare cases. Statistically strong PRNGs are performant and widely available though (most prominently the Mersenne Twister).
  • cryptographical weakness means that the output of the RNG is in some way predictable given knowledge other than the seed (such as the output itself). It makes absolutley no sense to require a PRNG used for fuzz testing to be cryptographically strong, because the "patterns" exhibited by statistically-strong-but-cryptographically-weak PRNGs are pretty much only an issue if you need to prevent a cryptographically versed attacker from predicting the output.
Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
  • So I understand what you're saying, but I'm still a bit confused. If you have something that's statistically but not cryptographically random, couldn't the fact that future terms of the output can be predicted from the previous terms mean that some possible fuzz sequences could not be generated? I guess what I'm thinking here is that if the output is somehow predictable from the previous terms, the fuzz sequence generated isn't "random enough." I'm probably missing something since what you're saying makes sense, but I can't quite convince myself why the distinction matters. – templatetypedef Mar 28 '11 at 09:39
  • @templatetypedef: Cryptographically strong PRNGs are just as predictable if you know their seed. The difference is that this is the *only* thing that allows you to predict the output. – Michael Borgwardt Mar 28 '11 at 10:03
  • I understand, but my question is that if statistically random PRNGs can have their next outputs predicted from their previous outputs, wouldn't that mean that there are certain random sequences that they couldn't generate (since given some particular sequence of outputs the following outputs would be fixed?) – templatetypedef Mar 28 '11 at 10:22
  • @templatetypedef: what you're saying is true also of cryptographically strong PRNGs. For example consider RC4 (which admittedly is a bit broken). It has 256! possible internal states, which is 2^1684. If it has period N, then there are 256^N different N-byte sequences that you might hope for it to output. I don't know the period of RC4, but it's a lot more than 200, so clearly it doesn't output all of them. Similar analysis will apply to the state size and period of any PRNG that's any good, since it will have colossal period. Just pick a big size, it won't output all sequences that big. – Steve Jessop Mar 28 '11 at 11:22
  • 1
    So, if you use N bytes of "random" data to construct a fuzz test case, then you certainly need a PRNG with at least N bytes of internal state if you want all potential inputs to be possible. That's true regardless of cryptographic security. Note that cryptographic security is not (for most PRNGs) a property of the algorithm, it's a statement of our own ignorance about the algorithm. Blum Blum Shub is unusual in that it has a rather strong proof of unpredictability - saying that it's cryptographically secure is a statement of our ignorance how to factor numbers quickly. – Steve Jessop Mar 28 '11 at 11:29
4

I don't think it really matters, but I can't prove it.

Fuzz-testing will only try some inputs, in most cases a minuscule proportion of the possibilities. No matter how good the RNG you use, it may or may not find one of the inputs that breaks your code, depending on what proportion of all possible inputs break your code. Unless the pattern in the PRNG is very simple, it seems to me unlikely that it will correspond in any way to a pattern in the "bad" inputs you're looking for, so it will hit it neither more nor less than true random.

In fact, if you knew how to pick an RNG to maximise the probability of it finding bad inputs, you could probably use that knowledge to help find the bug more directly...

I don't think you should use a really bad PRNG. rand for example is permitted to exhibit very simple patterns, such as the LSB alternating. And if your code uses a PRNG internally, you probably want to avoid using the same PRNG in a similar way in the test, just to be sure you don't accidentally only test cases where the input data matches the internally-generated number stream! Small risk, of course, since you'd hope they'll be using different seeds, but still.

It's usually not that hard in a given language to find crypto or at least secure hash libraries. SHA-1 is everywhere and easy to use to generate a stream, or failing that RC4 is trivial to implement yourself. Both provide pretty good PRNG, if not quite so secure as Blum Blum Shub. I'd have thought the main concern is speed - if for example a Mersenne Twister can generate fuzz test cases 10 times as fast, and the code under test is reasonably fast, then it might have a better chance of finding bad inputs in a given time regardless of the fact that given 624 outputs you can deduce the complete state of the RNG...

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
2

You don't need unpredictable source (that's exactly what a cryptographically secure generator is), you only need a source with good statistical properties.

So using a general purpose generator is enough - it is fast and usually reproduceable (which means problems are also reproduceable).

sharptooth
  • 167,383
  • 100
  • 513
  • 979