Is a*b < a for 0 <= b < 1 in IEEE floating point with positive a?

Question

I am writing in C++, using conformant IEEE arithmetic in round-to-nearest mode. If a is a positive short int (16 bits) and b is a float (32 bits), where 0 <= b < 1, does a*b < a always evaluate to true?

Does "positive short int" include zero ? (according to no less an authority than http://en.wikipedia.org/wiki/IEEE_floating_point, signed-zeros are allowed). If so, then a=0, b=0, a*b is not < a. — racraman, Feb 12 '15 at 03:02

score 2 · Accepted Answer · edited May 23 '17 at 11:52

Maybe. It depends on how the compiler decide to evaluate floating-point expressions (read about FLT_EVAL_METHOD, invented by C99 but now part of the C++ standard, if you want the gory details.)

As soon as a can be greater than 4, the product a*b expressed as a float will round up to a when b is "big enough", for example b = 1-ε/2 (where ε is the difference between 1.0 and the next representable number, 2^-23.) But if the compiler does not perform the rounding in the intermediate evaluation, before the comparison, then the product may be kept in some (better) internal precision where a*b is still different from a, and the comparison done on that internal precision will be always asserted. And this case is not uncommon: because of the design of the x87 coprocessor, keeping all results as 64-bit long double was typical of 32-bit x86 architecture, for example; 53-bit double would also keep all values separate, since 24+16<53.

Assuming there are no bugs in your compiler, a explicit cast to float should force the rounding, so (float)a*b < a should evaluate sometimes to false. Be specially cautious here, since this area is known to show compiler bugs, particularly since floating-point is declared "reserved to experts" and programmers are generally advised to not rely on details like these. You should particularly take care to not activate the optimization options (like /fp:fast) of your compiler, which are very likely to skip the rounding operation to improve performances.

A safer (but still not completely safe) way to perform the test is to explicitly store the result of the multiplication into a float variable, like in

float c = a * b;
  if (c < a) call_Houston();

Here again, the C++ standard require explicit rounding (which is quite logical since the representation of the expression has to be stored into the 32-bit float variable.) But here again, some clever compilers, particularly when in optimization mode, might guess that the expression is reused just after, and could take the short path and reuse the in-register evaluation (which has more precision), and ruin your efforts (and left Houston unaware.) The GCC compiler used to recommend in such case to beg the compiler with code like

volatile float c = a * b;
  if (c < a) call_Houston();

and goes to specific options like -ffloat-store. This does not prevent loss of sanity points. BTW, recent versions of GCC are much more sane on this respect (since bug323 is fixed.)

Thanks for a very detailed answer. I did not know about `FLT_EVAL_METHOD`. I am not sure that `b` can ever be "big enough". Enumerating all cases with `b = 1.f - machine_epsilon / 2` seems to say that it is not possible for `(float)(a*b) == a`. Please excuse the Python, but it makes it easier to be sure about the rounding. >>> import numpy >>> machine_eps = numpy.finfo(numpy.float32).eps >>> a = numpy.arange(1,2**16,dtype=numpy.float32) >>> b = numpy.float32(1.) - numpy.float32(machine_eps*0.5) >>> print repr(b) 0.99999994 >>> all(a*b < a) True — John Jumper, Mar 05 '15 at 20:37
@JohnJumper: for exactly the same reasons as above (just replace "your compiler" with "the C compiler(s) used to compile Python and NumPy"), you cannot be sure that the test is evaluated using the float specifications. — AntoineL, Mar 07 '15 at 12:41

Is a*b < a for 0 <= b < 1 in IEEE floating point with positive a?

1 Answers1