C++ Portable Floating-Point Bit Representation?

Question

Is there a C++ Standards compliant way to determining the structure of a 'float', 'double', and 'long double' at compile-time ( or run-time, as an alternative )?

If I assume std::numeric_limits< T >::is_iec559 == true and std::numeric_limits< T >::radix == 2, I suspect the is possible by the following rules:

first X-bits are the significand.
next Y-bits are the exponent.
last 1-bit is the sign-bit.

with the following expressions vaguely like:

size_t num_significand_bits = std::numeric_limits< T >::digits;
size_t num_exponent_bits = log2( 2 * std::numeric_limits< T >::max_exponent );
size_t num_sign_bits = 1u;

except I know

std::numeric_limits< T >::digits includes the "integer bit", whether or not the format actually explicitly represents it, so I don't know how to programmatically detect and adjust for this.
I'm guessing std::numeric_limits< T >::max_exponent is always 2^(num_exponent_bits)/2.

Background: I'm trying to overcome two issues portably:

set/get which bits are in the significand.
determine where the end of 'long double' is so I know not to read the implicit padding bits that'll have uninitialized memory.

I just saw [Question 10620601](http://stackoverflow.com/questions/10620601/portable-serialisation-of-ieee754-floating-point-values) which uses a seemingly Posix header `ieee754.h` that define structs with bit-field specifiers for everything. I like that idea, but I'm unsure if that's really portable. — Charles L Wilcox, Mar 08 '13 at 18:47
`Portable Floating-Point Bit Representation?` Yes, the ASCII representation would be portable to any language and any OS. I believe you might have an [XY Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem), what are you trying to do?\ — Lie Ryan, Mar 08 '13 at 18:49
Yes, I'm thinking in the solution-space of my already decomposed problem. I want to set and test the payload of a signaling-nan; I want to set and use a signaling-nan as a "null value" for floating-point numbers; however, I'd like to distinguish it from other NaNs produced by the system otherwise. `nan(char const*)` exists only for quiet-nan, and payload format is not portable. I need to do "equality testing", but obviously `operator==` for any NaN on either side returns false, so I have to test the underlying binary representation, while avoiding `long double`'s uninitialized packing-bits. — Charles L Wilcox, Mar 08 '13 at 18:55
@MatsPetersson `operator==` can't compare all floating-point values, namely qNaN and sNaN. — Charles L Wilcox, Mar 08 '13 at 19:07
So, basically, you want NaN == NaN to return true? How about `if (isnan(a) && isnan(b)) return true;`? — Mats Petersson, Mar 08 '13 at 19:08
I want that and more; I want `isnan(a) && isnan(b) && ( significand(a) == significand(b) )`. — Charles L Wilcox, Mar 08 '13 at 19:11
And you want this to work on IBM mainframes, PC's with SSE and x87 fpu, DEC Alpha, microcontrollers made by some unknown Taiwan company, and everything else, yes? I think you might just as well try come up with the answer to world starvation - in fact, that's probably a lot easier.... :) — Mats Petersson, Mar 08 '13 at 19:18
@CharlesLWilcox: what is this "payload" for? Are you sending the numbers to another machine? or saving it to a file? Or is this an IPC in the same system? Why can't you just use a real NULL value, or use a separate field to indicate nullness, or use `std::numeric_limits::max();` or `std::numeric_limits::min();` as your marker value? Relying on the bit pattern of floating point value is always going to be non-portable because IIRC C++ standard does not define the bit representation of floating point numbers. — Lie Ryan, Mar 08 '13 at 19:38
@LieRyan The "payload" of the sNaN is to distinguish it from other sNaNs other code could generate. I can't use "a real NULL", since this is not a pointer, but the primitive floating-point value itself; all I can do is use a specific value to represent the "null value". NaN's are effectively designed to represent a "null", "invalid", or "indeterminate" floating-point values. — Charles L Wilcox, Mar 08 '13 at 19:52
@LieRyan Since there is no standard way of creating a sNaN with a specific payload, I have to create the bit-representation directly/manually. It is portable if it follows a spec (IEEE-754/ICE-559), and if one can check for that spec via C++ standard APIs ( `numeric_limits::is_iec559` ). — Charles L Wilcox, Mar 08 '13 at 20:07

score 5 · Accepted Answer · answered Mar 08 '13 at 19:14

5

In short, no. If std::numeric_limits<T>::is_iec559, then you know the format of T, more or less: you still have to determine the byte order. For anything else, all bets are off. (The other formats I know that are still being used aren't even base 2: IBM mainframes use base 16, for example.) The "standard" arrangement of an IEC floating point has the sign on the high order bit, then the exponent, and the mantissa on the low order bits; if you can successfully view it as an uint64_t, for example (via memcpy, reinterpret_cast or union—`memcpy is guaranteed to work, but is less efficient than the other two), then:

for double:

uint64_t tmp;
memcpy( &tmp, &theDouble, sizeof( double ) );
bool isNeg = (tmp & 0x8000000000000000) != 0;
int  exp   = (int)( (tmp & 0x7FF0000000000000) >> 52 ) - 1022 - 53;
long mant  = (tmp & 0x000FFFFFFFFFFFFF) | 0x0010000000000000;

for `float:

uint32_t tmp;
memcpy( &tmp, &theFloat, sizeof( float ) );
bool isNeg = (tmp & 0x80000000) != 0;
int  exp   = (int)( (tmp & 0x7F800000) >> 23 ) - 126 - 24 );
long mant  = (tmp & 0x007FFFFF) | 0x00800000;

With regards to long double, it's worse, because different compilers treat it differently, even on the same machine. Nominally, it's ten bytes, but for alignment reasons, it may in fact be 12 or 16. Or just a synonym for double. If it's more than 10 bytes, I think you can count on it being packed into the first 10 bytes, so that &myLongDouble gives the address of the 10 byte value. But generally speaking, I'd avoid long double.

answered Mar 08 '13 at 19:14

James Kanze

150,581
18
184
329

I've been tinkering with a `union { unsigned long words[ ( sizeof( T ) -1 ) / sizeof(unsigned long) + 1]; T value; };` to inspect the bits. – Charles L Wilcox Mar 08 '13 at 19:20
1

Yes, I've read that 'long double' is actually 'double' on Windows 64-bit. On my GNU/Linux 64-bit, it has the +2 packing-bytes to get to 12-bytes. – Charles L Wilcox Mar 08 '13 at 19:21
Heh, "just don't bother for `long double`"; I hadn't seriously considered that before. For `float` and `double`, if not `is_iec559`, I don't try to pack the significand with a bit-pattern; I could ignore packing the significand for `long double`, no matter the `is_iec559`ness. – Charles L Wilcox Mar 08 '13 at 19:24
@CharlesLWilcox That's a strange union. The classical union would be something like: `union { unsigned char image[ sizeof(T) ]; T value }'. This still means that even on platforms with IEC, you have to consider endianness. Since IEC guarantees that `double` is 64 bits, and is not implemented on any platform which doesn't have `uint64_t`, those are the two types I'd alias. As for how to alias: `memcpy` works everywhere (but is usually significantly slower); the `union` or the `reinterpret_cast` will normally work in specific situations, but require more care. – James Kanze Mar 10 '13 at 13:26
I was using `unsigned long` as a guess for the native word-size, for faster math. ( `std::bitset` on GNU/Linux happens to use this, and I was looking at that around the same time. ) Yes, the `uint8_t chars[ sizeof(T) ]` would be much more explicit / straightforward. – Charles L Wilcox Mar 10 '13 at 19:12

Mats Petersson · Answer 2 · 2013-03-08T19:03:56.357

1

I would say that the only portable way is to store the number as a string. This is not relying on "interpreting bit patterns"

Even if you know how many bits something is, doesn't mean that it has the same representation - the exponent zero-based or biased. Is there an invisible 1 at the front of the mantissa? The same applies to all of the other parts of the number. And it gets even worse for BCD encoded or "hexadecimal" floats - these are available in some architectures...

If you are worried about uninitialized bits in a structure (class, array, etc), then use memset to set the entire structure to zero [or some other known value].

edited Mar 08 '13 at 19:03

answered Mar 08 '13 at 18:55

Mats Petersson

126,704
14
140
227

+ mantissa representation (2's complement or sign-magnitude), special values (INFINITY, NAN), subnormal values, order of bits. And there may be some more intricacies. – Alexey Frunze Mar 08 '13 at 19:01
I really don't want to serialize to/from string just to set a specific value, and test for that value in other `T`'s later on. (Also, setting a signaling-NaN value via `scanf` or `operator>>` may not be portable.) I need to write a function to test for this pattern in a user's `T`; `memset`ing a local `T`, assigning the user-value to the local, then inspecting the bits seems a bit over-complicated. – Charles L Wilcox Mar 08 '13 at 19:03
Um, a generic solution is going to be even more complicated. Why not develop a platform-specific solution instead and include a check to see that this is the right platform? – Alexey Frunze Mar 08 '13 at 19:12
@AlexeyFrunze Thus my preconditions of floats really being IEEE754/IEC559, and the radix really being '2'. I'm writing a small library; useful libraries are portable. – Charles L Wilcox Mar 08 '13 at 19:14
1

@CharlesLWilcox If the floats are IEC, then the radix must be 2. – James Kanze Mar 08 '13 at 19:15
@JamesKanze Ahh, thanks. I was being a bit sloppy in not verifying that. :-) – Charles L Wilcox Mar 08 '13 at 19:25
@JamesKanze - well, yes, but at this point that's hyper-technical. IEC-559 has been merged into IEEE-754, and IEEE-754 as of 2008 provides **both** binary and decimal floating-point. Unfortunately, the C++ standard chose the wrong name for `numeric_limits::is_iec559`. – Pete Becker Mar 08 '13 at 20:08
@JamesKanze - a little followup: I left out ISO/IEC 60559, which was most recently revised in 2011, and is, apparently, identical to IEEE 754. – Pete Becker Mar 08 '13 at 20:24
@PeteBecker Yes, but do you know of a platform where IEC double is decimal? (I had, in fact, forgotten that possibility, because I've never actually seen it.) – James Kanze Mar 10 '13 at 13:27
@JamesKanze - not specifically, but IBM **really** liked decimal floating point. Hence the TR, which is now being considered for incorporation into the C++ standard. – Pete Becker Mar 10 '13 at 15:46
@Pete Probably because IBM produces hardware with decimal FPUs. – Tim Seguine Sep 26 '16 at 10:20

Charles L Wilcox · Answer 3 · 2013-03-11T18:47:04.580

For posterity, this is what I ended up doing.

To generate and test for my IEEE-754 signaling-NaN values, I use this pattern for 'float' and 'double'.

#include <cstdint> // uint32_t, uint64_t
#include <limits> // numeric_limits

union IEEE754_Float_Union
{
    float value;
    uint32_t bits;
};

float generate_IEEE754_float()
{
    IEEE754_Float_Union u = { -std::numeric_limits< float >::signaling_NaN() };
    size_t const num_significand_bits_to_set = std::numeric_limits< float >::digits
                                               - 1 // implicit "integer-bit"
                                               - 1; // the "signaling-bit"
    u.bits |= ( static_cast< uint32_t >( 1 ) << num_significand_bits_to_set ) - 1;
    return u.value;
}

bool test_IEEE754_float( float const& a_r_val )
{
    IEEE754_Float_Union const u = { a_r_val };
    IEEE754_Float_Union const expected_u = { generate_IEEE754_float() };
    return u.bits == expected_u.bits;
}

For 'long double', I use the 'double' functions with casting. Specifically, I generate the 'double' value and cast it to 'long double' before it's returned, and I test the 'long double' by casting to 'double' then testing that value. My idea is that, while the 'long double' format can vary, casting a 'double' into a 'long double', then casting it back to 'double' later on should be consistent, ( i.e. not loose any information. )

Just FYI, this violates the strict aliasing rule of C++. It should "probably" work in any compiler that also supports standard C, though. Just don't be surprised if it doesn't. — Tim Seguine, Sep 26 '16 at 10:32

C++ Portable Floating-Point Bit Representation?

3 Answers3