Why hexadecimal floating constants in C++17?

Question

C++17 to add hexadecimal floating constant (floating point literal). Why? How about a couple of examples showing the benefits.

(1) For long-overdue compatibility with C (starting with ISO C99) (2) To *unambiguously* define floating-point constants (for IEEE-754 binary floating-point formats in particular), without the bugs often affecting conversions from decimal floating-point (see [Rick Regan's blog](http://www.exploringbinary.com/) for examples of such bugs in modern, commonly used, software) — njuffa, Apr 01 '16 at 04:59
An example for the need to unambiguously specify literal floating-point constants would be creation of code for math library functions, e.g. [`atanf`](http://stackoverflow.com/questions/26692859/best-machine-optimized-polynomial-minimax-approximation-to-arctangent-on-1-1), [`erff`](http://stackoverflow.com/questions/35148198/efficient-faithfully-rounded-implementation-of-error-function-erff), [`expf`](http://stackoverflow.com/questions/29381117/which-exponentiation-algorithms-do-cpu-programming-languages-use) — njuffa, Apr 01 '16 at 05:22

Serge Rogatch · Accepted Answer · 2016-04-01T15:48:09.380

Floating point numbers are stored in x86/x64 processors in base 2, not base 10: https://en.wikipedia.org/wiki/Double-precision_floating-point_format . Because of that many decimal floating point numbers cannot be represented exactly, e.g decimal 0.1 could be represented as something like 0.1000000000000003 or 0.0999999999999997 - whatever has base 2 representation close enough to decimal 0.1 . Because of that inexactness, e.g. printing in decimal and then parsing of a floating-point number may result in a slightly different number than the one stored in memory binarily before printing.

For some application emergence of such errors is unacceptable: they want to parse into exactly the same binary floating-point number as the one which was before printing (e.g. one application exports floating-point data and another imports). For that, one could export and import doubles in hexadecimal format. Because 16 is a power of 2, binary floating-point numbers can be represented exactly in hexadecimal format.

printf and scanf have been extended with %a format specifier which allows to print and parse hexadecimal floating point numbers. Though MSVC++ does not support %a format specifier for scanf yet:

The a and A specifiers (see printf Type Field Characters) are not available with scanf.

To print a double in full precision with hexadecimal format one should specify printing of 13 hexadecimal digits after point, which correspond to 13*4=52 bits:

double x = 0.1;
printf("%.13a", x);

See more details on hexadecimal floating point with code and examples (note that at least for MSVC++ 2013 simple specification of %a in printf prints 6 hexadecimal digits after point, not 13 - this is stated in the end of the article).

Specifically for constants, as asked in the question, hexadecimal constants may be convenient for testing the application on exact hard-coded floating-point inputs. E.g. your bug may be reproducible for 0.1000000000000003, but not for 0.0999999999999997, so you need hexadecimal hardcoded value to specify the representation of interest for decimal 0.1 .

The examples shown in the article are for gcc. In the last paragraph of the article I say to use “%.13a” instead of “%a" for VC++. — Rick Regan, Apr 01 '16 at 14:29
@RickRegan , sorry, didn't notice that end of the article initially. I've edited my answer. — Serge Rogatch, Apr 01 '16 at 15:49
Note that `1` all floating point numbers in base `a` is representible in base `b` if and only if `b` is divisible by all prime factors of `a`, `2` so all numbers in base 2 can be represented exactly in base 10 too, and `3` 17 decimal digits are enough for `double` to round-trip. — user202729, Apr 25 '18 at 16:04

score 2 · Answer 2 · answered Apr 01 '16 at 16:51

2

The main 2 reasons to use hex floats over decimals are accuracy and speed.

The algorithms for accurately converting between decimal constants and the underlying binary format of floating point numbers are surprisingly complicated, and even nowadays conversion errors still occasionally arise.

Converting between hexadecimal and binary is a much simpler endeavour, and guaranteed to be exact. An example use case is when it is critical that you use a specific floating point number, and not one either side (e.g. for implementations of special functions such as exp). This simplicity also makes the conversion much faster (it doesn't require any intermediate "bignum" arithmetic): in some cases I've seen 3x speed up for read/write operations for hex float vs decimals.

answered Apr 01 '16 at 16:51

Simon Byrne

7,694
1
26
50

2

I don't understand how this could make code faster. E.g. `0xA.0f` should be *exactly* the same as `10.0f` at the bit level, so how could the source-code representation possibly make the implementation any faster? Why any runtime conversion at all? – RamblingMad Jun 21 '16 at 02:12
1

I meant if you were using `printf` or `fscanf` – Simon Byrne Jun 21 '16 at 11:29
wait what if one platform uses base2 floats and other uses decimal floats? ieee 754 supposes various formats – Swift - Friday Pie Aug 16 '17 at 14:38
There aren't that many platforms that support decimal floats in hardware, and those that do still tend to keep binary floats as the default, keeping decimal as a secondary type. – Simon Byrne Aug 18 '17 at 21:11
Re "guaranteed to be exact": Only if there are enough bits to represent, of course. | The question is about _constant_, not _reading/writing_. – user202729 Apr 25 '18 at 16:06

Why hexadecimal floating constants in C++17?

2 Answers2