C - adding two single-precision floating point normal numbers, can't get result to infinity

Question

I'm playing around with floating-point arithmetic, and I encountered something which needs explaining.

When setting rounding mode to 'towards zero', aka:

fesetround(FE_TOWARDZERO);

And adding different kind of normal positive numbers, I can never reach Infinity.

However, it is known from the ieee 745 that overflow to infinity can result from adding finite numbers.

For instance:

#include <fenv.h>
#include <stdio.h>

float hex2float (int hex_num) {
  return *(float*)&hex_num;
}

void main() {
  int a_int = 0x7f7fffff; // Maximum finite single precision number, about 3.4E38
  int b_int = 0x7f7fffff;
  float a = hex2float(a_int);
  float b = hex2float(b_int);
  float res_add;

  fesetround(FE_TOWARDZERO);  // need to include fenv.h for that
  printf("Calculating... %+e + %+e\n",a,b);
  res_add = a + b;
  printf("Res = %+e\n",res_add);
}

However, If i change rounding mode to something other, I might get a +INF as the answer.

Can someone explain this?

I have no idea what `hex2float` does, as you don't even mention the library you are using. But I am pretty sure it is not taking `int`s.. — Eugene Sh., May 25 '16 at 13:54
@EugeneSh.: Of course, but well defined in all actual practice. — Dolda2000, May 25 '16 at 13:59
Anyway, bad function naming. It has nothing to do with `hex`. — Eugene Sh., May 25 '16 at 14:00
Should definitely be named `int2float()`, as there is not data type `hex`. — alk, May 25 '16 at 14:01
@EugeneSh.: I agree, but that has very little to do with the actual question. :) — Dolda2000, May 25 '16 at 14:01
And any ways, this is UB. (http://stackoverflow.com/q/98650/694576) — alk, May 25 '16 at 14:02
@Dolda2000 Yeah, just commenting :) Had to figure it out before looking at the actual problem.. — Eugene Sh., May 25 '16 at 14:02
@alk: If you're referring to the pointer aliasing, then perhaps it is, but that is not the cause of the behavior being asked about. — Dolda2000, May 25 '16 at 14:04
@EugeneSh.: Having tested it, the result is `MAXFLOAT`, just like `a` and `b` are. — Dolda2000, May 25 '16 at 14:05
Isn't that what you would expect in round-to-zero? The real number result of the addition is bigger than any finite float, but gets rounded to the largest magnitude float that is no further from zero, MAXFLOAT. — Patricia Shanahan, May 25 '16 at 14:14
I think [this](http://www.gnu.org/software/libc/manual/html_node/Rounding.html) is pretty comprehensive... — Eugene Sh., May 25 '16 at 14:14
And as for limit constants, you should use predefined macros instead of making obscure pointer magic: http://en.cppreference.com/w/c/types/limits — Eugene Sh., May 25 '16 at 14:18
@PatriciaShanahan: By that logic, you could also argue that `FE_TONEAREST` rounding should also have `MAXFLOAT + MAXFLOAT == MAXFLOAT`, since any finite result will be mathematically "nearer" to `MAXFLOAT` than to infinity. ;) — Dolda2000, May 25 '16 at 14:23
Thank you all. By the way, I use the name hex2float as opposed to int2float, because the conversion is not done between integer and float. i.e: 7 to 7.0. It is done from the binary (or hexadecimal) representation of the float type to a floating point type. — Shay Golan, May 25 '16 at 14:27
@ShayGolan Binary/hexadecimal/decimal/octal or any other representations are absolutely equivalent. Why are you discriminating decimal while saying " the binary (or hexadecimal)"? — Eugene Sh., May 25 '16 at 14:33
Yes, you have a point. But I preferred to name the function in this way to avoid confusion with conversion between integer and float. — Shay Golan, May 25 '16 at 14:40
Using `float a = FLT_MAX;` would have avoided the distracting discussion about `hex2float()`. — chux - Reinstate Monica, May 25 '16 at 15:10

score 6 · Accepted Answer · edited Jun 20 '20 at 09:12

The explanation for the observed behavior is that it is mandated by the IEEE 754-2008 floating-point standard:

7.4 Overflow

The overflow exception shall be signaled if and only if the destination format’s largest finite number is exceeded in magnitude by what would have been the rounded floating-point result (see 4) were the exponent range unbounded. The default result shall be determined by the rounding-direction attribute and the sign of the intermediate result as follows:

[...]

b) roundTowardZero carries all overflows to the format’s largest finite number with the sign of the intermediate result.

So for the rounding mode used here (truncation, or rounding towards zero), the result in case of overflow is the largest finite number, not infinity.

C - adding two single-precision floating point normal numbers, can't get result to infinity

1 Answers1