Performance implications of long double. Why does C choose 64-bits instead of the hardware's 80-bit for its default?

Question

For specifics I am talking about x87 PC architecture and the C compiler.

I am writing my own interpreter and the reasoning behind the double datatype confuses me. Especially where efficiency is concerned. Could someone explain WHY C has decided on a 64-bit double and not the hardware native 80-bit double? And why has the hardware settled on an 80-bit double, since that is not aligned? What are the performance implications of each? I would like to use an 80-bit double for my default numeric type. But the choices of the compiler developers make me concerned that this is not the best choice.

double on x86 is only 2 bytes shorter, why doesn't the compiler use the 10 byte long double by default?
Can I get an example of the extra precision gotten by 80-bit long double vs double?
Why does Microsoft disable long double by default?
In terms of magnitude, how much worse / slower is long double on typical x86/x64 PC hardware?

x86 has pretty much moved back to 64-bit double with SSE and such... 80-bit FP turned out to be a mess since, yeah, it isn't a power-of-two. — Mysticial, Apr 21 '12 at 03:51
No, it does use the FPU. The x87 FPU supports rounding to single and double precision. — Mysticial, Apr 21 '12 at 03:53
I am interested in the performance of `80-bit` because the extra hardware native precision seems nice for an interpreter. If the hardware's just rounding to 64-bits, what are the consequences of defaulting to `long double`? — unixman83, Apr 21 '12 at 03:56
For one, there's no vectorization support for 80-bit. So you lose whatever gains you get from SSE/AVX. On x64, the x87 FPU only has half the registers as the SSE/AVX units. As far as scalar performance goes, I'm not sure since I've never seriously used the x87 FPU. I assume it's about the same as scalar SSE. — Mysticial, Apr 21 '12 at 03:59
IMO, the x87 FPU is useless unless you actually need 80-bit floats. If you're just gonna use 64-bit double, SSE is the way to go - vectorized or not. — Mysticial, Apr 21 '12 at 04:04
So in that case, there will be a performance difference between 80-bit floats vs. 64-bit floats. And that gap is getting wider. So you're discouraged from using 80-bit floats. (As Microsoft has by simply getting rid of it completely.) — Mysticial, Apr 21 '12 at 04:06
Does Microsoft's compiler use SSE for its `double` data-type now? I am going to use C++ and the MS compiler to write the interpreter. — unixman83, Apr 21 '12 at 04:12
When compiling for x64, yes it will use SSE for `double`. On x86, it will only use it if you specify `/arch:SSE2` - since not all 32-bit x86 machines have SSE2. — Mysticial, Apr 21 '12 at 04:14
Possible duplicate of [Why did Microsoft abandon long double data type?](https://stackoverflow.com/questions/7120710/why-did-microsoft-abandon-long-double-data-type) — phuclv, Sep 10 '18 at 17:17
Cross-site duplicate which addresses performance on modern CPUs: [Did any compiler fully use Intel x87 80-bit floating point?](//retrocomputing.stackexchange.com/a/9760) — Peter Cordes, Nov 28 '19 at 03:35

score 3 · Accepted Answer · answered Apr 21 '12 at 04:16

The answer, according to Mysticial, is that Microsoft uses SSE2 for its double data-type. The Floating point unit (FPU) x87 is seen as outdated and slow in comparison to modern CPU extensions. SSE2 does not support 80-bit, hence the compiler's choice of 64-bit precision.

On 32-bit x86 architecture, since all CPUs don't have SSE2 yet, Microsoft still uses the floating point unit (FPU) x87 unless the compiler switch /arch:SSE2 is given. Which makes the code incompatible with those older? CPUs.

Yeah, that pretty much sums up what I said in the comments. – Mysticial Apr 21 '12 at 04:17 — Mysticial, Apr 21 '12 at 04:17

score 2 · Answer 2 · answered Apr 24 '12 at 13:54

2

Wrong question. It has nothing to do with C, all languages use AFAIK as standard floating-point single precision with 32 bit and double precision with 64 bit. C as a language supporting different hardware defines only

sizeof(float) <= sizeof(double) <= sizeof(long double)

so it is perfectly acceptable that a specific C compiler uses 32bit floats for all datatypes.

Intel decided on Kahans advise that they support as much precision as possible and that calculations in less precise formats (32 & 64 bit) should be performed internally with 80bit precision.

The difference in precision and exponent range: 64bit has approx. 16 decimal digits and a max exponent of 308, 80bit has 19 digits and a max exponent of 4932.

Being much more precise and having a far greater exponent range you can calculate intermediate results without overflow or underflow and your result has less rounding errors.

So the question is why long double does not support 80bit. In fact many compilers did support it, but a lack of use and the run for benchmark performance killed it effectively.

answered Apr 24 '12 at 13:54

Thorsten S.

4,144
27
41

1

Was the problem benchmark performance, or the fact that people working with code that nominally used "long double", but whose machines made it a 64-bit type, tended to write printf format strings that were wrong but would work [only] on platforms where "long double" was 64 bits? It's too bad the C Standard never specified a means by which variadic-function prototypes could specify minimum and maximum sizes for integer and floating-point types, so that printf implementations could say, e.g. "promote all integer types to at least 32 bits, and all floating-point types to 80 bits", and thus... – supercat May 09 '16 at 17:46
...avoid having to worry about size distinctions for anything but 64-bit integer types [and an implementation where "long" was 64 bits could push the minimum size to 64 bits if desired]. – supercat May 09 '16 at 17:48
@supercat I really think it was more the dominance of C and its successors. I think it is largely forgotten that at the time the Intel processors came out, there were many other architectures available: Motorola, VAX & RISC processors which did not have 80bit support. All those architectures were programmed in C and for interoperability they used the standard sizes. It is a good argument to choose 32bit float and 64bit double because it fits perfectly for integer datatypes; a simple cast is sufficient, you do not need to convert/round/move. – Thorsten S. May 10 '16 at 22:43
@ThorstenS.: The ability of the extended type to hold any value of type Int64 or UInt64 means that a operations between one of those and a floating-point type can be accommodated accurately by promoting both to extended. The lack of such a type makes such operations more problematical (e.g. `LL2=LL1-1.0f` may subtract a lot more than one from `LL2`. – supercat May 11 '16 at 14:55

score 2 · Answer 3 · edited Jun 20 '20 at 09:12

This is actually so many questions in one, some of which are even too broad

Could someone explain WHY C has decided on a 64-bit double and not the hardware native 80-bit double?

It's irrelevant to C, because the C standard only mandates the minimum requirements for the built-in types and it's entirely up to the compiler implementation to choose whatever format they want to use for a type. Nothing prevents a C compiler to use some custom-made 77-bit floating-point type

And why has the hardware settled on an 80-bit double, since that is not aligned? What are the performance implications of each?

It's aligned to a multiple of 2 bytes. Remember that x87 dates back to 8086 + 8087.

It's a good trade-off for modern hardware implementers and software writers who needs more precision for exact rounding in double operations. Too big of a type and you'll need significantly more transistors. Double the number of bits in the significand and the multiplier will need to be 4 times as big

William Kahan, a primary designer of the x87 arithmetic and initial IEEE 754 standard proposal notes on the development of the x87 floating point: "An Extended format as wide as we dared (80 bits) was included to serve the same support role as the 13-decimal internal format serves in Hewlett-Packard’s 10-decimal calculators." Moreover, Kahan notes that 64 bits was the widest significand across which carry propagation could be done without increasing the cycle time on the 8087, and that the x87 extended precision was designed to be extensible to higher precision in future processors: "For now the 10-byte Extended format is a tolerable compromise between the value of extra-precise arithmetic and the price of implementing it to run fast; very soon two more bytes of precision will become tolerable, and ultimately a 16-byte format... That kind of gradual evolution towards wider precision was already in view when IEEE Standard 754 for Floating-Point Arithmetic was framed.

https://en.wikipedia.org/wiki/Extended_precision#IEEE_754_extended_precision_formats

As you can see, with the 64-bit significand you can share the components (adder, multiplier...) with the integer ALU.

I would like to use an 80-bit double for my default numeric type. But the choices of the compiler developers make me concerned that this is not the best choice. double on x86 is only 2 bytes shorter, why doesn't the compiler use the 10 byte long double by default?

It's actually intended for using as a temporary variable (like tmp = (b*c + d)/e) to avoid intra overflow or underflow issues without special techniques like the Kahan summation. It's not your default floating-point type. In fact so many people use floating-point literals incorrectly when they use long double or float. They forgot to add the correct suffix which results in a lack of precision and then they ask why long double is just exactly the same as double. In summary, double should be used for almost every cases, unless you're limited by bandwidth or precision and you really know what you're doing

Can I get an example of the extra precision gotten by 80-bit long double vs double?

You can print the full value and see it your own. There are also a lot of questions that are worth reading

Why does Microsoft disable long double by default?

Microsoft doesn't disable long double by default. They just choose to map long double to IEEE-754 double precision which incidentally the same format as double. The type long double can still be used normally. They did that because math on SSE is faster and more consistent. That way you'll avoid "bugs" like the below

Besides 64-bit long double doesn't have the odd size which requires compiler to pad 6 zero bytes more (or deal with a non-power-of-2 type width) which is a waste of resources.

That said, it's not even that 80-bit long double is not available on x86. Currently only MSVC abandoned the extended precision type, other compilers for x86 (like GCC, Clang, ICC...) still support it and made 80-bit IEEE-754 the default format for long double. For example GCC has -mlong-double-64/80/128 and -m96/128bit-long-double to control the exact format of long double

Or without potentially breaking ABI compatibility by changing long double, you can use GNU C floating point type names like __float80 on targets that support it. This example on Godbolt compiles to 80-bit FP math whether it targets Windows or Linux.

In terms of magnitude, how much worse / slower is long double on typical x86/x64 PC hardware?

This cannot be answered because latency and throughput depends on each specific microarchitecture. However if you do a lot of floating-point operations then double will be significantly faster, because it has fewer bits in the significand, and it can be parallelized with SIMD. For example you can working on a vector of 8 doubles at a time with AVX-512. That can't be done with the extended precision type

Also, 80-bit x87 fp load and store instructions are significantly slower than the "normal" versions that convert to/from 32 or 64-bit, and only fstp is available, not fst. See Peter Cordes's answer on retrocomputing about x87 performance on modern CPUs. (In fact that's a cross-site duplicate of this, asking why MSVC doesn't expose an 80-bit x87 type as long double.)

Performance implications of long double. Why does C choose 64-bits instead of the hardware's 80-bit for its default?

3 Answers3

Linked

Related