OpenCL kernel float division gives different result

Question

I have a OpenCL kernel for some computation. I found only one thread gives different result with CPU codes. I am using vs2010 x64 release mode.

By checking the OpenCL codes by some examples, I found some interesting results. Here are the testing examples in kernel codes.

I tested 3 cases in OpenCl kernel, the precision is checked by printf("%.10f", fval);

case 1:

float fval = (10296184.0) / (float)(x*y*z);  // which gives result fval = 3351.6225585938

float fval = (10296184.0f) / (float)(x*y*z);  // which gives result fval = 3351.6225585938

Variables are: int x,y, z

these values are computed by some operations. And their values are x=12, y=16, z=16;

case 2:

float fval = (10296184.0) / (float)(12*16*16); // which gives result fval = 3351.6223144531

float fval = (10296184.0f) / (float)(12*16*16); // which gives result fval = 3351.6223144531

case 3:

However, when I compute the difference of fval by using above two expressions, the result is 0 if using 10296184.0.

float fval = (10296184.0) / (float)(x*y*z) - (10296184.0) / (float)(12*16*16); // which gives result fval = 0.0000000000

float fval = (10296184.0f) / (float)(x*y*z) - (10296184.0f) / (float)(12*16*16); // which gives result fval = 0.0001812663

Could anyone explain the reason or give me some hints?

Please see [Is floating point math broken](http://stackoverflow.com/questions/588004/is-floating-point-math-broken)? These inaccuracies are one reason why I always use `double` (although it suffers the same thing) unless some constraint forces me to use the inferior `float`. — Weather Vane, Aug 18 '16 at 19:17
If you need more precision than that, that's why double-precision FP is a thing on GPUs. It has very little application in rendering. — Andon M. Coleman, Aug 18 '16 at 19:20
Post how the values of `3351.6226, 3351.6223 and 0` were determined. `printf("%f", ...)`, debugger, etc. — chux - Reinstate Monica, Aug 18 '16 at 22:18
Key answer is the one given by @chux. It is the missmatch of compile time division vs run time division. Precision differs in both. — DarkZeros, Aug 19 '16 at 09:13
What compiler are you using, what platform, what compiler settings? What is the result of `fegetround()`? from `` What is the value of `FLT_EVAL_METHOD` from ``. Take this info, the answer(s) below, the results of the function and macro and we likely have all the additional info needed. The larger issue is in [FP broken?](http://stackoverflow.com/questions/588004/is-floating-point-math-broken) and how code should be written differently to be less susceptible to small ULP errors. — chux - Reinstate Monica, Aug 19 '16 at 12:56
Notice that you do not know is the result is _exactly_ 0.0 by using `printf("%.10f", fval);` You just know it is smaller than 0.00000000005. Use `e` not `f` instead: `printf("%.10e", fval);` — chux - Reinstate Monica, Aug 19 '16 at 13:01

chux - Reinstate Monica · Answer 1 · 2016-08-18T22:05:38.110

Some observations:

The two float values differ by 1 ULP. So the results differ by a minimum amount.

// Float ULP in the 2's place here
//       v
0x1.a2f3ea0000000p+11 3351.622314... // OP's lower float value 
0x1.a2f3eaaaaaaabp+11 3351.622395... // higher precision quotient
0x1.a2f3ec0000000p+11 3351.622558... // OP's higher float value

(10296184.0) / (float)(12*16*16) is calculated at compile time as is the closer result to the expected mathematical answer.

float fval = (10296184.0) / (float)(x*y*z) is calculated at run time.

Considering float variables being used, surprising that code is doing this division with double math. This is a double constant divide by a double (which is the promotion of the float product) resulting in a double quotient, converted to a float and then saved. I'd expect 10296184.0f - note the f - to have been used, then the math could have all been done as floats.

C allows different rounding modes denoted by FLT_ROUNDS This may differ at compile time and run time and may explain the difference. Knowing the result of fegetround() (The function gets the current rounding direction.) would help.

OP may have employed various compiler optimizations that sacrifice precision for speed.

C does not specify the precision of math operations, yet good to the last ULP should be expected with * / + - sqrt() modf() on quality platforms. I suspect code suffers from a weak math implementation.

OpenCL kernel float division gives different result

case 1:

case 2:

case 3:

1 Answers1