Random generation of C programs with floating-point

Question

Does anyone know a random generator of C programs that include floating-point computations?

I am looking for something that would be a little bit like Csmith, except that Csmith does not generate floating-point expressions, and that it generates tons of other constructs, making it a little difficult to modify. Generating sequential computations would be a good start for my purpose as long as these included some floating-point ones. Conditionals would be even better, but I wouldn't need loops, pointers, or even arrays.

Since so many languages use a C-like syntax, such a generator may not have to be specific to C. Even if it's specific to another C-like language, I might be able to text-process a generated program for that language into a C program.

EDIT: here is a snippet of a Csmith-generated program to clarify what I am looking for.

...
int64_t *l_374 = &g_189;
int32_t l_375 = (-1L);
int i, j, k;
l_375 &= ((g_106 == ((*l_374) = (&g_324[4] == l_373[0][0][5]))) < 0x80C8L);
return (*g_207);
...

I should also clarify that while taking a Csmith program and substituting, say, int64_t with float may give a syntactically correct C program, it will almost certainly not give a defined program. I can test whether a substituted program contains undefined behavior, but this is not cheap, and if I have to reject 99% of substituted programs because they are undefined, the process will be too slow to be useful.

I'm totally sleep deprived right now, so sorry if this is stupid, but am I reading this right? You want to randomly generate C programs? Far out! Good on you! Why? :P — TheIronKnuckle, Dec 14 '11 at 07:32
@TheIronKnuckle For "differential testing": compile a randomly generated *defined* C program with two different compilers, and if the programs give different results, you have found a bug in one of the compilers. http://www.linux-mips.org/pub/linux/mips/people/macro/DEC/DTJ/DTJT08/DTJT08PF.PDF — Pascal Cuoq, Dec 14 '11 at 09:01
@TheIronKnuckle Well, it is a little bit more complicated with floating-point, because floating-point is underspecified in C99, and two compilers would both be correct and give different results in presence of floating-point. This is why the feature was not included in Csmith. But I still think I would be able to use random floating-point programs for my purpose. — Pascal Cuoq, Dec 14 '11 at 09:05
It seems to me that the programs shouldn't be random. You should look into creating programs that generate perfect coverage instead. Test all possible cases, special cases, corner cases... Using some type of a script you could also put @VALUE@ "variables" for static floating point values and the script can then change that value to random numbers (but still static to the C compiler.) — Alexis Wilke, Dec 21 '11 at 06:11
@AlexisWilke [Monkey testing](http://en.wikipedia.org/wiki/Monkey_test) is just one type of unit testing, it doesn't exclude the possibility of any other ways of (automated) testing and arguably has its merits. — Eric, Dec 21 '11 at 13:08
certainly we can make one by ourselves. what do you want to do with such a generator? testing compiler? — Timothy, Dec 22 '11 at 05:13
@Skyler Yes, testing compilers is one thing I want to do. I have a theory that using SSE2 (and its exact implementation of IEEE 754 double precision floats), compilers have no excuse to produce different results on floating-point, and I want to test this theory. I also have a static analyzer that I pretend can predict all possible results of a compilation with the historical x87 80-bit instructions, so I intend to test that too. — Pascal Cuoq, Dec 22 '11 at 09:50
@AlexisWilke I am not sure what you mean by "perfect coverage", but if you mean making sure that all branches are taken in the target code, you are underestimating the difficulty of making sure that floating-point code is bit-perfect. The bugs do not come only from the control flow. Besides, as Eric said, you can both test "all possible cases, special cases, corner cases" and do random testing in addition. You'd be surprised how much it still finds when applied last. — Pascal Cuoq, Dec 22 '11 at 10:06
Well... you just said to Skyler that all the outputs should be bit perfect (he! he!), so yes, testing to the bit you should get the same results in all cases. But no, perfect coverage encompasses all possible values rather than just all possible branches. So if you test additions you should add X + Y = Z where all possible X and all possible Y are tested. I know, with double precision floats, that's a bit much. Now you can check special cases like from -1.0 to +1.0 (and even just that range is enormous) and random cases for smaller/larger values. That's what I would do, at least. — Alexis Wilke, Dec 22 '11 at 23:51
@AlexisWilke There are nearly 2^64 doubles, roughly half of which (2^63) are between -1.0 and 1.0. It's not just "enormous". You can't enumerate them using one cycle per number, much less test a simple binary operation such as addition. — Pascal Cuoq, Dec 23 '11 at 00:25

Pascal Cuoq · Answer 1 · 2011-12-22T11:09:03.263

I have started on a small floating-point fuzzer. It does little for now, but you have to start with something.

Here is an example of use for comparing compilers generating SSE2 instructions, that I claim have no excuse for generating differing results:

#include <stdio.h>
double x0 = 35945970.47e-83;
double x1 = (973e-37+(5626073.612783921311173024e-76*231.106261545926274055e1*66390306733994e-1*420514.99786508*654374994.1249111e-35*5201.6039804e56)+(2.93604195+33e-50)+(969222843.32046212043603+1734e01)+(0166605914e8+6701040019050623e-23+32591206968562.6e-11+90771798.753788905)+(328e-49/944642906580982081e7));

int main(){
  x0 = (((x1*534425399171e0)*(x1*x0*x0)*(x1*x0*57063248719.703555336277392e-36*x0*472e57*65189741246535e-1)*x1*(x1/22393742341e70)*(x1+x0+x0+x0))-((843193503867271987e3*61.949746266e23*x1*x1*x0)/(x1/x1)));
  x0 = ((x0+x1+x1+x1+x0)-(x0*506680.0005767722e66*396.650621163*70798334426455964.1*x1*305369e14));
  x1 = 660098705340e-21;
  printf("%a\n", x0);
}

For this program, gcc and clang (which on this platform generate SSE2 instructions) generate executables that compute the same thing:

~/genfloat $ gcc t.c ; ./a.out 
0x1.5c5a77a63c1d6p+430
~/genfloat $ clang t.c ; ./a.out 
0x1.5c5a77a63c1d6p+430

I also intend to test a static analyzer that is supposed to predict all possible results that can be obtained with a program compiled with x87 instructions, spilling some intermediate results to double-precision memory locations in an unpredictable fashion:

~/genfloat $ frama-c -val -float-hex -all-rounding-modes t.c 
...
      x0 ∈ [0x1.5c5a77a63c1cap430 .. 0x1.5c5a77a63c1e8p430]

The above is a strong claim that needs to be tested.

Basile Starynkevitch · Answer 2 · 2012-10-05T22:06:04.883

My manydl.c program is doing something very similar (on integers). You might adapt it quite easily to your needs.

I wrote that as a tiny hack to convince some people, notably Jacques Pitrat, that a Linux system can dlopen a very big lot (more than hundred of thousands) of shared objects, that program generate random C code -focused on integers- and compiles and dlopen-s then executes a lot of them. You could adapt it to floating point needs. I designed my manydl.c so that it generates random but terminating C programs, so you could adapt it to float (just choose operations which are terminating and cheap, like I did).

Ask me more at coffee time

^{(since we are close colleagues)}

Random generation of C programs with floating-point

2 Answers2