Are IEEE float and double guaranteed to be the same size on any OS?

Question

I'm working on a OS portable database system. I want our database files to be OS portable so that customers can move their database files to other kinds of OS's at their discretion. Because of this use case I need my data types to be consistent across OS's, and I'm wondering if IEEE float's and double's are guaranteed to be the same byte size on any OS?

IEEE 754 data types are platform-agnostic by definition. But the C++ `float` and `double` types are not guaranteed to be IEEE 754 `binary32` and `binary64`. I assume you're more interested in the latter? — , Jun 11 '14 at 07:37
You might want to add the case of `CHAR_BIT != 8` to your question. Most answers here will probably tell you that `float` is guaranteed to be 32-bit long and `double` is guaranteed to be 64-bit long. But what if, for example, `CHAR_BIT` is defined as `16`? — barak manos, Jun 11 '14 at 07:42
@PaulR Thanks, Paul, I'm handling that by swapping bytes inside the storage engine and always making sure that data is stored in a little endian byte order. — , Jun 11 '14 at 07:52
This pertains more to your specific scenario, but you may be best off with storing your values using arbitrary precision floating point numbers, which aren't hard to implement yourself. Simply multiply the floating point number by a power of two such that it can fit as the biggest possible integer inside your allotted storage, then store the integer along with the power of two you multiplied it by. Doing it this way will guarantee that the file is readable by any architecture. I could write a simple example if you desire. — Kaslai, Jun 11 '14 at 08:20
I thought it would be a fun exercise to implement a basic version of what I talked about. A more robust implementation would permit for larger exponents and the use of long double, but that should be left to someone who has serious needs for such details. If all you need is to be able to store the entirety of a double though, this should be sufficient. http://pastebin.com/6UVTi55d — Kaslai, Jun 11 '14 at 09:30

score 6 · Accepted Answer · edited Jun 20 '20 at 09:12

C++ says almost nothing about the representation of floating point types.

[basic.fundamental]/8 says (Emphasis mine):

There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. Integral and floating types are collectively called arithmetic types. Specializations of the standard template std::numeric_limits (18.3) shall specify the maximum and minimum values of each arithmetic type for an implementation.

If you just write C++ code using float, double and long double, you have virtually no guarantees, apart from those given in the documentation for your particular compiler, and those that can be implied from std::numeric_limits.

On the other hand, IEEE 754 provides exact definitions of the behaviour and binary representation of its floating point types. These definitions are not quite enough to guarantee identical behaviour on all IEEE 754 platforms, since (for example) IEEE 754 sometimes allows multiple operations to be folded together when the result would be more precise than performing the two operations separately. This is likely to be unimportant to your specific case, since you just want the files to be portable, and probably do not care quite as much about identical queries creating identical changes to the files on different platforms as you do about identical files being loaded in identical ways on different platforms.

So the question is: "how do I get a portable IEEE 754 implementation for C++?".

The answer to this question is somewhat tricky. Most C++ compilers for reasonable platforms will provide at least float and double that approximately match IEEE 754's binary32 and binary64 specifications (although you will need to read the documentation for each individual compiler to be sure).

Alternatively, you can use a software floating point implementation or wrapper such as FLIP, libgcc's soft-float, SoftFloat, or STREFLOP. These libraries sometimes still make assumptions about the implementation that are not completely portable according to the C++ standard, so use at your own risk.

Does this mean that 1.1+0.9=28.7 is allowed by the standard? — jinawee, Feb 27 '19 at 09:50
@jinawee: C++14, `[expr.add]/3` states "The result of the binary + operator is the sum of the operands.". `[numeric.limits]` somewhat constrains the results, but as far as I can see there is nothing stopping `1.1+0.9=28.7`, as long as appropriate values are listed for `numeric_limits::round_error()` and `numeric_limits::round_style`, beyond the fact that it would most likely be considered a very low quality implementation. — Mankarse, Feb 27 '19 at 11:34
C++ does not constrain this behaviour, because it is expected that any reasonable implementation will also implement `ISO/IEC 10967` (Language Independent Arithmetic) and/or `IEEE 754` (IEEE Floating Point), or otherwise have some reasonable behaviour. — Mankarse, Feb 27 '19 at 11:36

score 2 · Answer 2 · edited May 23 '17 at 12:08

2

--cut-- Nevermind https://stackoverflow.com/a/24157568/2422450 provides a better explanation for the float sizes.

If you're however thinking about storing these floats in binary data files, do make sure you don't mess up the byte order or endianness. If you're dumping raw floats, some systems store the bytes in a different order, so casting the 4 bytes you just read to a double might give some surprising results.

edited May 23 '17 at 12:08

Community

1
1

answered Jun 11 '14 at 07:39

Lanting

3,060
12
28

Would you have a quote from the C or C++ standards to back the statement about the sizes? Or are you referring only to IEEE 754? If so, it might be worth clarifying. I suspect OP thinks the C and/or C++ standards mandate IEEE 754 floating point. – juanchopanza Jun 11 '14 at 07:52
Thanks. I'm handling byte ordering issues by always storing data in little endian byte order. – Jun 11 '14 at 07:56
"If you're however thinking about storing these floats in binary data files," then you should probably write your own size- and endianness-independent class to ensure guaranteed representation, since C++ does not require these attributes to be at all portable for its built-in types, and so storing them as binary is usually just asking for trouble. – underscore_d Feb 26 '16 at 20:10

score 2 · Answer 3 · edited Jun 20 '20 at 09:12

std::numeric_limits<T>::is_iec559

Determines if a given type follows IEC 559, which is another name for IEEE 754.

This serves as further evidence that IEEE is optional, and offers a way for you to check if it is used or not.

C++11 N3337 standard draft 18.3.2.4 numeric_limits members:

static constexpr bool is_iec559;

56 True if and only if the type adheres to IEC 559 standard. (217)

57 Meaningful for all floating point types.

(217) International Electrotechnical Commission standard 559 is the same as IEEE 754.

Sample code:

#include <iostream>
#include <limits>

int main() {
    std::cout << std::numeric_limits<float>::is_iec559 << std::endl;
    std::cout << std::numeric_limits<double>::is_iec559 << std::endl;
    std::cout << std::numeric_limits<long double>::is_iec559 << std::endl;
}

Outputs:

1
1
1

on Ubuntu 16.04 x86-64.

__STDC_IEC_559__ is an analogous macro for C: https://stackoverflow.com/a/31967139/895245

Rationale

This is an interesting article that describes the rationale behind not fixing sizes, and hot to get around it: http://yosefk.com/blog/consistency-how-to-defeat-the-purpose-of-ieee-floating-point.html

score 0 · Answer 4 · answered Jun 11 '14 at 07:37

0

They are. "float" will be 32 bits, "double" will be 64 bits. The byte ordering might be different; it's exactly the same as with 32 bit and 64 bit integers.

If you need extended precision: That may or may not be available as "long double". And extended precision uses 80 bits, but "long double" may have additional padding bits.

answered Jun 11 '14 at 07:37

gnasher729

51,477
5
75
98

Nite that there also exists quad precision (128 bits) which may be used instead of x86's 80-bit precision on some platforms. – Richard J. Ross III Jun 11 '14 at 07:44
1

I don't think these sizes are fixed in the C or C++ standards. But do you have a quote to back this answer up? – juanchopanza Jun 11 '14 at 07:51
I guess you are referring to IEEE 754, not the C or C++ standards. Sorry for the confusion. – juanchopanza Jun 11 '14 at 07:57

Are IEEE float and double guaranteed to be the same size on any OS?

4 Answers4

C++ says almost nothing about the representation of floating point types.

So the question is: "how do I get a portable IEEE 754 implementation for C++?".

Linked