Floating-point literal to IEEE-754 binary pattern consistency across compilers

Question

Here's a question with answers on "Cross Platform Floating Point Consistency" but it talks exclusively about runtime consistency (of IEEE floating point).

I'm interested in compile-time consistency, specifically:

If I have a specific floating-point number and want to put a floating-point literal in my source code and have every compiler targeting an IEEE-754 architecture compile that to the same bit pattern that is in fact that float (or double): what do I need to do?

A certain number of digits?
The exact decimal number for that bit pattern (rather than any decimal number that maps to that binary pattern)?
Or?

(I know there has been controversy for years over what you need to do to round-trip floating point values from IEEE format to decimal representations and back and I don't know if this is or is not an issue with floating point literals and the compilers (and the C++ standard).)

If you can use C++17, you can round-trip using hexadecimal floating point literals. I do this all the time for (say) digital filter coefficients, quadrature nodes, so on. Controversy avoided. — user14717, Mar 06 '19 at 18:13
That's a great answer - except I happen to be limited to C++11 currently. So I'm also interested in "downlevel" C++! — davidbak, Mar 06 '19 at 18:16
I think you will suffer then. If there was a good answer to your question which didn't resort to hexadecimal floating point literals, then why did it need to be added to C++17? — user14717, Mar 06 '19 at 18:19
Note: I'm not an expert on this, but check out the "table maker's dilemma" in Higham's accuracy and stabilty of numerical algorithms. I *think* this shows what you want is impossible. — user14717, Mar 06 '19 at 18:21
@user14717 - Well, maybe there are ways to do it which are inconvenient and/or non-obvious. I mean, why add binary integer literals when we already had hex and octal literals? — davidbak, Mar 06 '19 at 18:22
@user14717 - w.r.t. table maker's dilemma - I don't think so, this is more related to, e.g., [this paper "How to read floating point numbers accurately"](https://dl.acm.org/citation.cfm?id=93557) - a 1990 paper but progress (in terms of runtime libraries) in this particular area has been slow. — davidbak, Mar 06 '19 at 18:25
Without hexadecimal FP literals: What about storing the exact bit pattern in a char array and `memcpy` it into the double variable? — Aconcagua, Mar 06 '19 at 19:08
@Aconcagua - thanks - I know how to "work around" the issue but what I'm looking for is a rule on how to use C++ floating-point literals so that workarounds aren't necessary. — davidbak, Mar 06 '19 at 19:28

score 3 · Accepted Answer · answered Mar 06 '19 at 21:26

You can take advantage of the fact that, while every decimal floating point number does not have an exact representation in the IEEE-754 floating point representation (which uses binary), every IEEE floating point number has an exact representation as a decimal floating point number.

The C++ language specification, in the [lex.fcon] ("floating literals"), discusses floating point literals. After a description of all the parts of a floating point literal, it says

If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.

(This working is the same in both N3242, a C++11 late working paper, and N4741 from 2018. I was unable to find this description on CPPReference.)

This means that numbers like 0.1 can be either slightly less or slightly more than the desired value, others like 0.5 or 0.000000000931322574615478515625 (2^-30) will have that value with all conforming compilers.

You'll need to take your decimal number, get an IEEE-754 representation for the number either just before or just after it, then convert that representation to an equivalent decimal number. Once you have that, all standards conformant compilers that support the IEEE-754 floating point format should give you the exact same constant.

This is doable - esp. (for `float`) with the aid of the [online converter here](https://www.h-schmidt.net/FloatConverter/IEEE754.html). And maybe it's the best rule overall. Let's wait just a bit and see. — davidbak, Mar 06 '19 at 21:35

Floating-point literal to IEEE-754 binary pattern consistency across compilers

1 Answers1