Calculating FLops

Question

I am writing a program to calculate the duration that my CPU take to do one "FLops". For that I wrote the code below

before = clock();
y= 4.8;
x= 2.3;
z= 0;
for (i = 0; i < MAX; ++i){
 z=x*y+z;
}
printf("%1.20f\n", ( (clock()-before )/CLOCKS_PER_SEC )/MAX);

The problem that I am repeating the same operation. Doesn't the compiler optimize this sort of "Thing"? If so what I have to do to get the correct results?

I am not using the "rand" function so it does not conflict my result.

You will need to either declare `double before` and `double after` and then use `before = (double)clock();` (you missed out the `after`), or, cast the calculation as `(double)(after - before) / CLOCKS_PER_SEC / MAX` — Weather Vane, Mar 03 '15 at 19:57
At higher optimization levels, if you don't use the calculated `z` after exiting from the loop, it's quite possible the compiler might notice that and eliminate the entire calculation. — twalberg, Mar 03 '15 at 20:10

score 1 · Accepted Answer · edited May 23 '17 at 12:06

This has a loop-carried dependency and not enough stuff to do in parallel, so if anything is even executed at all, it would not be FLOPs that you're measuring, with this you will probably measure the latency of floating point addition. The loop carried dependency chain serializes all those additions. That chain has some little side-chains with multiplications in them, but they don't depend on anything so only their throughput matters. But that throughput is going to be better than the latency of an addition on any reasonable processor.

To actually measure FLOPs, there is no single recipe. The optimal conditions depend strongly on the microarchitecture. The number of independent dependency chains you need, the optimal add/mul ratio, whether you should use FMA, it all depends. Typically you have to do something more complicated than what you wrote, and if you're set on using a high level language, you have to somehow trick it into actually doing anything at all.

For inspiration see how do I achieve the theoretical maximum of 4 FLOPs per cycle?

hallole · Answer 2 · 2015-03-04T08:15:18.603

Even if you have no compiler optimization going on (possibilities have already been nicely listed), your variables and result will be in cache after the first loop iteration and from then on your on the track with way more speed and performance than you would be, if the program would have to fetch new values for each iteration.

So if you want to calculate the time for a single flop for a single iteration of this program you would actually have to give new input for every iteration. Really consider using rand() and just seed with a known value srand(1) or so.

Your calculations should also be different; flops are the number of computations your program does so in your case 2*n (where n = MAX). To calculate the amount of time per flop divide time used by the amount of flops.

Calculating FLops

2 Answers2

Linked