what does floating-point numbers do when it comes to an interger that it cannot represent

Question

We know that a floating-point format with an n-bit fraction, give a formula for the smallest positive integer that cannot be represented exactly (because it would require an n+1-bit fraction to be exact). So for single-precision format where frac bit n = 23:

the smallest positive integer it cannot be represented exactly is 2^24+1. So my question is, lets say we happen to use this number as

float a = ...;
float b = ...;
float c= a+b;  //where a+b is 2^24+1

so what does C does here? let it overflow? And how can we be confodent to use float point numbers as there is always a chance to encounter an interger number that cannot be represented, which reduce precision, which could causes serious consuquence in banking system?

Please read [Why not use Double or Float to represent currency?](https://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency) Apart from using integers to represent currency, don't *expect* a floating point value to be accurate. There are only so many discrete values (from an infinite range) that can be represented in a finite storage space. — Weather Vane, Jun 28 '20 at 13:29
2^24 - 1 is most certainly representable as a `float`. The smallest non-representable positive integer is 2^24 + 1. If you want to be able to represent any `int`, use `double` instead of `float`. In fact, unless you have a really good reason, always use `double`. — rici, Jun 28 '20 at 16:34
@rici sorry that was a typo, I did mean 2^24 +1. so even I use `double`, thats still a chance that I will deal with 2^53 + 1 which still cannot be represented exact by `double`? — , Jun 29 '20 at 01:20
@amjad: there are only a finite number of possible representable numbers, and some of them are not integers. So some integers are not representable precisely. But 2^53 is bigger than any `int` (on most platforms), so it's possible to say that every `int` has a precise `double` representation. Which is sometimes useful (but not guaranteed by the standard). — rici, Jun 29 '20 at 01:26

score 0 · Accepted Answer · answered Jun 28 '20 at 13:37

so what does C does here?

The C standard leaves it to the implementation to specify what happens. C 2018 5.2.4.2.2 7 says:

The accuracy of the floating-point operations (+, -, *, /) and of the library functions in <math.h> and <complex.h> that return floating-point results is implementation-defined, as is the accuracy of the conversion between floating-point internal representations and string representations performed by the library functions in <stdio.h>, <stdlib.h>, and <wchar.h>. The implementation may state that the accuracy is unknown.

Commonly, for operations of +, -, *, and /, C implementations produce the representable result nearest the real-number result, with ties rounded to the representable number with an even low digit, with a caveat that some C implementations may use more precision than the nominal type for intermediate results (such as calculating all of an expression using long double even though the expression contains only double numbers, and then converting the final result to double for assignment).

For “simple” math routines, such as fma, results rounded above are commonly returned. For complicated functions, such as trigonometric and logarithmic functions, C implementations vary in accuracy.

For conversions between floating-point and decimal text, such as with scanf, printf, and strtod, implementations vary in quality. Good implementations will produce correctly rounded results.

let it overflow?

“Overflow” in floating-point refers to a result that exceeds the finite range of the floating-point format, that is, a result that, even when rounding is considered, exceeds in magnitude the largest finite value representable in the format. It does not refer to a result that exceeds the range in the format where all integers are representable

And how can we be [confident] to use float point numbers as there is always a chance to encounter an [integer] number that cannot be represented, which reduce precision, which could causes serious [consequence] in banking system?

Since the required guarantees are not provided by the C standard, one must use documentation for the C implementation one is using or avoid floating-point arithmetic in C. Many C implementations use the IEEE-754 standard for floating-point arithmetic to some extent, and it provides requirements about rounding behavior. And a software engineer must choose a floating-point format with sufficient precision for whatever purpose they are using it for.

Note that this issue is not unique to floating-point arithmetic. Integer arithmetic is similarly incapable of representing numbers that arise in financial arithmetic and other situations. As one example, calculating interest quickly gives rise to fractions of pennies, so the programmer must design their software to do the math correctly even when fractions cannot be represented. And, of course, integer arithmetic can overflow. Regardless of the number format used, a software engineer must pay attention to its properties and limits.

what does floating-point numbers do when it comes to an interger that it cannot represent

1 Answers1