If C and C++'s double (and float) is IEEE 754-1985, then are the integer representations and Infinity, -0, NaN, etc, all left unused?

Question

It appears that JavaScript's number type is exactly the same as C and C++'s double type, and both are IEEE 754-1985.

JavaScript can use IEEE 754 as integers but when the number becomes big or gets an arithmetic calculation such as divided by 10 or by 3, it seemed like it can switch into floating point mode. Now C and C++ only use IEEE 754 as double and therefore only use the floating point portion and do not use the "integer" portion. Therefore, do C and C++ left the integer representations unused?

(and C left the NaN, Infinite, -Infinite, -0 unused as I recalled never using them in C).

_"The main question is, did C and C++ leave a lot of representations of double and float unused?"_ No. When C++ uses IEEE-754, those values are all representable. See https://en.cppreference.com/w/cpp/types/numeric_limits — Drew Dormann, Aug 13 '22 at 00:59
C and C++ don't specify the type of floating point used, but IEEE 754 is by far the most commonly encountered. — Mark Ransom, Aug 13 '22 at 01:00
The C standard supports IEEE 754 (now known as IEC 60559) but doesn't require it. — sj95126, Aug 13 '22 at 01:02
You do seem to be an experienced member of Stack Overflow - do keep in mind that asking several different questions about multiple languages may get your question closed as _lacking focus_. — Drew Dormann, Aug 13 '22 at 01:03
@DrewDormann that's because they seem to all tie to one answer. If the answer is one way or another, they answer one question and all questions. This one is a tough one -- if I ask 3 separate questions, some users may point out they all point to one answer. Ok, I changed the last part from a question to something that maybe able to lead to an answer — nonopolarity, Aug 13 '22 at 01:06
Regarding other questions here - `std::sqrt(-1)` is an easy way to get one of the values you say you've never seen. "(-2^53 - 1) to (2^53 - 1)" also probably isn't a good way to describe the range of IEEE-754 numbers. "x / 2^n" where x and n are in a certain integer range perhaps better describes values representable by these types. The smaller `float` type can represent numbers much larger than 2^53 — Drew Dormann, Aug 13 '22 at 01:11
`both are IEEE 754-1985` no, neither C nor C++ require IEEE-754. And NaN or Inf definitely existed decades ago, although [checking them in standard C89 is a little bit more inconvenient](https://stackoverflow.com/q/59797359/995714), it's just that you don't know them but it doesn't mean others don't use them, otherwise they won't be in the IEEE-754 standard — phuclv, Aug 13 '22 at 01:11
and obviously 123 and 123.0 have different representations. One is integer in one's complement, two's complement or sign-magnitude format; and the other in floating-point format — phuclv, Aug 13 '22 at 01:13
Also (another question) some integer representations **are** unrepresentable in a same-sized IEEE-754. For very large numbers, only even integers can be represented by floating-point. For even larger numbers, only integers divisable by 4. And so on... For a certain byte size, there are integers that can't be represented by floating-point, and floating-points that can't be represented by integers. They use the same number of bytes to represent numbers in a different way. — Drew Dormann, Aug 13 '22 at 01:16
I think `double` and `float` tend to be the native processor's `double` and `float` because they are the fastest, although they don't have to be (such as a layer or virtual machine), ... so I guess it is common for them to be IEEE 754, although I am not sure about other processors like Sun Sparc RISC, 68000, IBM Power microprocessors, or the ARM M1 and M2 — nonopolarity, Aug 13 '22 at 01:21

Eric Postpischil · Accepted Answer · 2022-08-13T10:20:00.887

8

If that's the case, isn't it true that the IEEE 754's representations of [integers and some special values] were all unused, as C and C++ didn't have the capability of referencing them?

This notion appears as if it might stem from the fact that JavaScript uses the IEEE-754 binary64 format for all numbers and performs (or at least defines) bitwise operations by converting the binary64 format to an integer format for the actual operation. (For example, a bitwise AND in JavaScript is defined, via the ECMAScript specification, as the AND of the bits obtained by converting the operands to a 32-bit signed integer.)

C and C++ do not use this model. Floating-point and integer types are separate, and values are not kept in a common container. C and C++ evaluate expressions based on the types of the operands and do so differently for integer and floating-point operations. If you have some variable x with a floating-point value, it has been declared as a floating-point type, and it behaves that way. If some variable y has been declared with an integer type, it behaves as an integer type.

C and C++ do not specify that IEEE 754 is used, except that C has an optional annex that specifies the equivalent of IEEE 754 (IEC 60559), and C and C++ implementations may choose to conform use IEEE-754 formats and to conform to it. The IEEE-754 binary64 format is overwhelmingly used for double by C and C++ implementations, although many do not fully conform to IEEE-754 in their implementation.

In the binary64 format, the encoding as a sign bit S, an 11-bit “exponent” code E, and a 52-bit “significand code,” F (for “fraction,” since S for significand is already taken for the sign bit). The value represented is:

If E is 2047 and F is not zero, the value represented is NaN. The bits of F may be used to convey supplemental information, and S remains an isolated sign bit.
If E is 2047 and F is zero, the value represented is +∞ or −∞ according to whether S is 0 or 1.
If E is neither 0 nor 2047, the value represented is (−1)^S•(1 + F/2⁵²)•2^E−1023.
If E is zero, the value represented is (−1)^S•(0 + F/2⁵²)•2¹⁻¹⁰²³. In particular, when S is 1 and F is 0, the value is said to be −0, which is equal to but distinguished from +0.

These representations include all the integers from −2⁵³−1 to +2⁵³−1 (and more), both infinities, both zeros, and NaN.

If a double has some integer value, say 123, then it simply has that integer value. It does not become an int and is not treated as an integer type by C or C++.

But from (-2⁵³ - 1) to (2⁵³ - 1), that's a lot of numbers unused…

There are no encodings unused in the binary64 format, except that one might consider the numerous NaN encodings wasted. Indeed many implementations do waste them by making them inaccessible or hard to access by programs. However, the IEEE-754 standard leaves them available for whatever purposes users may wish to put them to, and there are people who use them for debugging information, such as recording the program counter where a NaN was created.

edited Aug 13 '22 at 10:20

answered Aug 13 '22 at 01:52

Eric Postpischil

195,579
13
168
312

are you saying all integers in IEEE 754 are just represented as a floating point, but just that the "Exponent part" is `0`? So it is not like: IEEE 754 has a flag that says it is storing a number in the remaining 63 bit as an integer or as a floating point... ok, I guess that's correct, because otherwise we'd be able to have integers as big as `2 ** 62 - 1`, rather than `2 ** 53 - 1`... so that would mean, the bigger the number is, the less decimal part it can have? That actually blew my mind, as I always thought the decimal part is able to have a constant length no matter what. – nonopolarity Aug 13 '22 at 02:18
I just confirmed in JavaScript: `Number.MAX_SAFE_INTEGER` is `9007199254740991` and `Number.MAX_SAFE_INTEGER - 0.1` is `9007199254740991` -- this number is not capable of having even one decimal digit. I then removed 3 least significant digits from that number and use `9007199254740`. `9007199254740 - 0.01` is `9007199254739.99` and `9007199254740 - 0.001` is `9007199254739.998`, but `9007199254740 - 0.0001` is `9007199254740`: it is no longer able to have decimal digits past the 3rd place – nonopolarity Aug 13 '22 at 02:23
my goodness... if that's the case, I wonder how many calculations for galaxy and stars made that mistake: `a * b * c - d` and the programmer thought minus d would have an effect but it fact it did nothing because `a * b * c` is already quite big – nonopolarity Aug 13 '22 at 02:28
right, it doesn't need to be `9007199254740991`. I change the leading `9` to `6` just to make sure the most significant bit is needed. Now `6007199254740991 - 0.1` is also `6007199254740991` and is also not capable of having one decimal place. There is also this interesting behavior that `6007199254740991 - 0.5` should be `6007199254740990.5` and therefore `6007199254740991`, but it in fact becomes `6007199254740990`. And then `6007199254740991 - 0.4999999999999999` is able to become `6007199254740991` but `6007199254740991 - 0.49999999999999999` becomes `6007199254740990` – nonopolarity Aug 13 '22 at 02:38
1

It's a bit misleading to say that Javascript performs "integer arithmetic" by first converting the operands to an integer format. It certainly doesn't do that for the usual arithmetic operators (+, -, /, ...). It does integer conversion (to a 32-bit signed integer, as you say) for bitwise boolean operators (`&`, `|`, `^`, `~`) and the various shift operators. – rici Aug 13 '22 at 05:24
@nonopolarity: `9007199254740 - 0.01` produces 9007199254739.990234375, not 9007199254739.99. JavaScript prints 9007199254739.99 because its default conversion of Number to string only produces enough digits to uniquely distinguish the Number, not all the digits needed to show the exact value of the Number. To understand floating-point, you should work or think in the base used for it, two, not in decimal. – Eric Postpischil Aug 13 '22 at 10:22
is that right... I just tried Ruby, Python3, and even Perl5, they all print out `9007199254740 - 0.01` as `9007199254739.99`. Is there a way to show `9007199254739.990234375` on any platform? – nonopolarity Aug 13 '22 at 15:43
1

@nonopolarity most languages have a formatting method that will let you specify more than the default precision. For example in Python the expression `f'{(9007199254740 - 0.01):25.25}'` returns `' 9007199254739.990234375'`. – Mark Ransom Aug 13 '22 at 18:48
ok... it is just an accuracy thing and how precisely we print it out... ok, I misread the numbers in @EricPostpischil 's comment. So I suppose in C, it'd be the `printf()` with `"%060.30f"` or not `f` but `lf`. In Ruby 2.8.5, it can be `"%060.30f" % [9007199254740 - 0.01]` and I do see it as `"00000000000000009007199254739.990234375000000000000000000000"` and I suspect that many languages intentionally do this: we know it cannot be so accurate, so we won't show you that many significant digits -- not that much precision but will show you less so that the inaccuracy does not appear – nonopolarity Aug 13 '22 at 20:03
3

@nonopolarity: Re “we know it cannot be so accurate”: It **can** be so accurate. Each floating-point value other than NaN values represents one number exactly, by specification of the IEEE-754 standard. In floating-point arithmetic, it is the operations that approximate real arithmetic, not the numbers that approximate real numbers. Each operation produces a result that is approximately (or exactly) the result the corresponding real-number arithmetic would produce, but each floating-point is exactly one real number. – Eric Postpischil Aug 14 '22 at 02:35
1

@nonopolarity: Understanding this distinction is crucial for analyzing, designing, proving, and debugging floating-point software. The number represented by the floating-point value 9007199254739.990234375 is exactly 9007199254739.990234375, and it might even be the number the user of some software intended to compute. Software that prints “9007199254739.99” for it may do so because that is enough to uniquely identify the number or because the software author figures that is all the user might care about or because they were lazy, but they should not do so because the number is “inaccurate.“ – Eric Postpischil Aug 14 '22 at 02:37

score 1 · Answer 2 · answered Aug 13 '22 at 01:15

1

The int number 123 is exactly the same as the double number 123.0, as you can easily see by testing 123 == 123.0. Their representations are different internally though.

answered Aug 13 '22 at 01:15

Mark Ransom

299,747
42
398
622

"Their representations are different internally though" so that means we in fact left all those integer representations unused? – nonopolarity Aug 13 '22 at 01:16
1

@nonopolarity no they're not unused, they're just represented by a completely different bit pattern. – Mark Ransom Aug 13 '22 at 01:17
but I'd suppose the `123` representation would never occur in a C or C++ program using `double`? – nonopolarity Aug 13 '22 at 01:24
@nonopolarity are you asking if the value `123` - stored as an integer - would have some usable value if it were interpreted as the bits of a floating-point number? Yes. It would be a different number. Approximately `1.7236e-43` – Drew Dormann Aug 13 '22 at 01:26
@nonopolarity who said that? In `123 == 123.0` first `123` is converted to `double` then the 2 double values are compared to each other. There are no reasons for 123 to be stored as 123.0 and the compiler is also forbidden from doing that, one of the reasons is that many integer values aren't representable in floating-point – phuclv Aug 13 '22 at 01:30

If C and C++'s double (and float) is IEEE 754-1985, then are the integer representations and Infinity, -0, NaN, etc, all left unused?

2 Answers2