Does the C++ standard specify anything on the representation of floating point numbers?

Question

For types T for which std::is_floating_point<T>::value is true, does the C++ standard specify anything on the way that T should be implemented?

For example, does T has even to follow a sign/mantissa/exponent representation? Or can it be completely arbitrary?

TartanLlama · Accepted Answer · 2015-12-15T17:01:50.127

42

From N3337:

[basic.fundamental/8]: There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types. Specializations of the standard template std::numeric_limits (18.3) shall specify the maximum and minimum values of each arithmetic type for an implementation.

If you want to check if your implementation uses IEEE-754, you can use std::numeric_limits::is_iec559:

static_assert(std::numeric_limits<double>::is_iec559,
              "This code requires IEEE-754 doubles");

There are a number of other helper traits in this area, such as has_infinity, quiet_NaN and more.

edited Dec 15 '15 at 17:01

answered Dec 15 '15 at 16:55

TartanLlama

63,752
13
157
193

1

Your quote does not really answer the question. The question asks if there are restrictions on allowed floating point formats. Your answer shows that the floating point formats must be documented by the implementor. It's related but not really the same thing. – Dec 15 '15 at 17:05
6

@hvd: That (implementor must provide documentation) IS the only restriction, apart from the required ranges and operations which the math library must implement. – Ben Voigt Dec 15 '15 at 17:19
@hvd I'm not sure I understand. Would a simple "this is the only restriction" in the answer be sufficient for you? – TartanLlama Dec 15 '15 at 17:22
2

@TartanLlama If it actually were the only restriction, then it would be fine, but it's not. The members of `std::numeric_limits` are defined in a way that slightly limits the possible representations of `double`. (And same goes for all other floating point types.) The requirement to be able to call `extern "C"` functions taking a pointer to a C++ `double` further restricts the possible representations to those allowed by C. – Dec 15 '15 at 18:01
@hvd What is the exact guarantee for what happens once you call the `extern "C"` function compiled by a C compiler? Ie, what ABI guarantees does C++ actually provide when interacting with `extern "C"`? In practice, lots; in theory, how much? Second, is it illegal to have `int` like `numeric_limits` in a floating point value? – Yakk - Adam Nevraumont Dec 15 '15 at 21:16
@Yakk "Every implementation shall provide for linkage to functions written in the C programming language, "C", and linkage to C++ functions, "C++"." ([dcl.link]p3) requires a call from a C++ program to a function written in C to actually work, does it not? It may require a specific C compiler to be used, a C++ implementation can obviously not be compatible with every C implementation out there at the same time, but if it's not compatible with any C implementation, then I'd say it doesn't meet that requirement. – Dec 15 '15 at 21:32
@Yakk As for `numeric_limits`, one restriction that's clear from the standard is that the radix must be a constant. If "implementation-defined" were the only requirement, then a floating-point number might have a bit indicating whether it's a decimal or a binary floating-point number. (Which would be silly, I know.) Your specific example, `int`-like `numeric_limits`, is less clear. There's a footnote that the `numeric_limits` members are the same as C's `DBL_MIN_EXP` and others, which adds requirements again, but footnotes aren't normative. – Dec 15 '15 at 21:37
1

Is zero not required to be all zeros? – user541686 Dec 16 '15 at 06:15
@Mehrdad No, it isn't. For integer types, all bits zero must represent zero, but it doesn't need to be the only representation of zero. For floating point types, all bits zero has no special meaning. – Dec 16 '15 at 06:46
@hvd I could have forgotten something, but I *think* all-bits-zero is only required to represent zero for *unsigned* integer types. – zwol Dec 16 '15 at 15:10
@zwol It comes from the C standard, which states "For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type." – Dec 16 '15 at 22:07
@hvd Aha. That sentence is new in C11, is why I didn't know about it. (I changed jobs in 2005 and no longer needed to have the C standard memorized, so I haven't close-read C11.) C++, however, contains no such requirement AFAICT; in fact, it appears to me that C++ still hasn't been harmonized with the *C99*, never mind C11, definition of integer types. Blech. – zwol Dec 16 '15 at 22:36
@zwol It's pre-C11, it's one of the C99 TCs, so it already applies to C++ indirectly for the same reason I commented earlier about floating point types: the required support for `extern "C"` makes it impossible for the C++ representation of fundamental types to be something that C doesn't allow. But yeah, the C++ standard should really be explicit about it. – Dec 16 '15 at 22:40
@zwol C99 6.2.6.2/2 requires that if the sign bit is 0, the value bits must be the same as they would were the same value stored in an unsigned type. It does permit padding bits, the value of which are unspecified, so while all (sign or value) bits zero is necessarily zero, it may be possible that all-bits-zero would be a trap representation in some implementations that have padding bits. – Ray Dec 18 '15 at 00:14

zwol · Answer 2 · 2015-12-16T17:41:54.673

The C standard has an "annex" (in C11 it's Annex F) which lays out what it means for an implementation of C to be compliant with IEC 60559, the successor standard to IEEE 754. An implementation that conforms to Annex F must have IEEE-representation floating point numbers. However, implementing this annex is optional; the core standard specifically avoids saying anything about the representation of floating point numbers.

I do not know whether there is an equivalent annex for C++. It doesn't appear in N3337, but that might just mean it's distributed separately. The existence of std::numeric_limits<floating-type>::is_iec559 indicates that the C++ committee at least thought about this, but perhaps not in as much detail as the C committee did. (It is and has always been a damned shame that the C++ standard is not expressed as a set of edits to the C standard.)

Jerry Coffin · Answer 3 · 2015-12-15T19:11:40.273

8

No particular implementation is required. The C++ standard doesn't talk about it much at all. The C standard goes into quite a bit of detail about the conceptual model assumed for floating point numbers, with a sign, exponent, significand in some base b, and so on. It, however, specifically states that this is purely descriptive, not a requirement on the implementation (C11, footnote 21):

The floating-point model is intended to clarify the description of each floating-point characteristic and does not require the floating-point arithmetic of the implementation to be identical.

That said, although the details can vary, at least offhand it seems to me that producing (for example) a conforming implementation of double that didn't fit fairly closely with the usual model (i.e., a significand and exponent) would be difficult (or at least difficult to do with competitive performance, anyway). It wouldn't be particularly difficult to have it vary in other ways though, such as rearranging the order, or using a different base.

The definition of std::numeric_limits<T>::digits (and std::numeric_limits<T>::digits10) imply fairly directly that what's listed as a floating point type must retain (at least approximately) the same precision for all numbers across a fairly wide range of magnitudes. By far the most obvious way to accomplish that is to have some number of bits/digits devoted to a significand, and some other (separate) set of bits devoted to an exponent.

edited Dec 15 '15 at 19:11

answered Dec 15 '15 at 18:34

Jerry Coffin

476,176
80
629
1,111

1

I think the definitions of `numeric_limits::digits` and `digits10` make it difficult to get away from there being a significand in there somewhere (how the bit representation of it works is another matter). So at the very least you can't smuggle a fixed-width type through as `double` and conform. I suspect as you do, that it's constrained to be *floating* point. Using a different base is explicitly covered by `radix` :-) – Steve Jessop Dec 15 '15 at 18:47
@SteveJessop: Yeah, that was pretty much my thinking as well. – Jerry Coffin Dec 15 '15 at 18:51
@SteveJessop What exactly prevents you from copy-pasting `int` for almost every property of `numeric_limits`? – Yakk - Adam Nevraumont Dec 15 '15 at 21:20
@Yakk An implementation could conceivably provide only one size of integer and one size of float (i.e. `char`, `short`, `int`, `long`, and `long long` are all the same size and have the same limits, and similarly `float`, `double` and `long double` are all the same size and have the same limits) but I do not think the requirements for an integer type and the requirements for a floating type can be satisified simultaneously. – zwol Dec 15 '15 at 21:27
@gnasher729: For `float` you need to maintain 6 decimal digits of precision from 10e-37 to 10e+37. That's about 80 decimal digits total. At ~3.5 bits/digit, you'd need about 280 bits. For `double` you need to maintain 10 decimal digits, increasing the minimum fixed point size to ~300 bits. – Jerry Coffin Dec 16 '15 at 00:37
Actually, I think `numeric_limits::is_exact == true` would apply to fixed point and decimal types. I think `numeric_limits` *could* support fixed point. The digits and digits10 represent the length of the significand - this would apply to fixed nums just fine. `is_bounded==true`, `has_*_NaN==false`, `has_infinity==false`, `has_denorm==none`... – emsr Dec 16 '15 at 03:24
@zwol Right. I think the only requirements are that `double` can't have *less* precision than `float`, etc. – emsr Dec 16 '15 at 03:26
@emsr: There are *slightly* more stringent requirements than that. `float` has to have at least 6 decimals digits of precision, and `double` and `long double` need at least 10. – Jerry Coffin Dec 16 '15 at 17:28

score 4 · Answer 4 · answered Dec 15 '15 at 17:14

The idea of std::is_floating_point is to make user code of different origin work together better. Technically you can specify an int as std::is_floating_point without causing undefined behavior. But say you have some templated library that has to repeatedly divide by T n. To speed things up the library creates a T ni = 1 / n and replaces division by n by multiplication by ni. This works great for floating point numbers, but fails for integers. Therefore the library correctly only does the optimization if std::is_floating_point<T>::value == true. If you lie the code probably still works from the standard's point of view, but is incorrect from a logical point of view. So if you write a class that behaves like a bigger float mark it as std::is_floating_point, otherwise don't. This should get you both optimal and correct code.

Does the C++ standard specify anything on the representation of floating point numbers?

4 Answers4

Linked