What uncommon floating-point sizes exist in C++ compilers?

Question

The C++14 draft standard seems rather quiet about the specific requirements for float, double and long double, although these sizes seem to be common:

float: IEEE 32-bit floating-point representation (roughly 7 digits of precision, exponent range of 1e-38..1e+38)
double: IEEE 64-bit floating-point representation (roughly 16 digits of precision, exponent range of 1e-308..1e+308)
long double: 80-bit floating-point representation (roughly 19 digits of precision, exponent range of 1e-4951..1e+4932)

What C++ compilers and systems currently use floating-point sizes other than these?

I'm interested in longer, shorter, and non-binary representations using the standard types, not libraries, as my primary interest is portability of C++ programs.

This is a really interesting question, but I'm concerned it might not be a good fit for the Stack Overflow Q&A format because it doesn't have a single, definitive, best answer. That said, I'm still really curious about this! — templatetypedef, Jul 21 '16 at 16:12
A quick search from the standard doc "There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double". As you can see it only provides the minimum precision not the upper bound. — anurag-jain, Jul 21 '16 at 16:15
In VS, double and long double offer the same. Refer https://msdn.microsoft.com/en-IN/library/s3f49ktz.aspx. — anurag-jain, Jul 21 '16 at 16:23
Apparently some platforms/compilers can do 128-bit quadruple precision floating-points: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Computer-language_support — jpw, Jul 21 '16 at 16:49
jpw: I've heard of references to 128 bit floating-point, and I also have heard there are implementations that use decimal floating-point rather than binary, but I don't know of any specific examples. Do you? — Mike, Jul 21 '16 at 17:53
@Mike I haven't, but it's way outside my field of knowledge. The Wikipedia page mentions the Oracle Sun Studio compiler (or w/e it's called these days) and Intel doing 128-bit on PPC and Sparc, but as I said, this is outside my domain - I just thought the link might be interesting :) — jpw, Jul 21 '16 at 18:46
@anurag-jain there's nothing strange about that. Most non-x86 architectures like ARM, MIPS, SPARC... don't have an extended precision floating-point type so they map `long double` to the same type as double. Only recently GCC is migrating `long double` to IEEE quadruple precision — phuclv, Mar 16 '20 at 01:50

phuclv · Answer 1 · 2021-07-30T15:52:42.210

It's unclear what "uncommon sizes" you're talking about

If you're only asking about size in bits then "odd-sized" (i.e. not a power of 2) types usually exist in older platforms that don't use 8-bit (or another power of 2) bytes

One example is the Unisys ClearPath Dorado Servers with 36-bit float and 72-bit double. That beast is still even in active development until now. The last version was in 2018. Mainframes and servers live a very long life so you can still see some PDP-10 and other architectures in use in modern times, with modern compiler support.

But even in newer platforms you can still see some examples like Intel Itanium's 82-bit extended float format. Many platforms also use a 40-bit floating-point format. It's especially common in many modern DSPs that use 40-bit accumulators like the TI C3x/C4x, SHARC ADSP-21160, Atmel TSC21020F. There are also many old 40-bit floating-point formats like the IBM extended or Microsoft MBF extended formats. See also Why did 8-bit Basic use 40-bit floating point?

In addition there are some other non-standard 24-bit floats in a few modern C/C++ compilers for microcontrollers. And in computer graphics many minifloat formats like 10-bit or 11-bit floats aren't unknown, beside 16 and 24-bit floats
If you care about the formats then there are lots of standard compliant 32, 64 and 128-bit floating-point formats that aren't IEEE-754 like the hex and decimal floating point types in IBM z, Cray formats and VAX formats.

In fact IBM z is one of the very rare modern platforms with decimal float hardware, although if you use GCC and some other compilers you can use their built-in software support for decimal float. IBM also uses the special double-double format which is still the default for long double on PowerPC until now

Here's the summary of most of the available floating-point formats. See also Do any real-world CPUs not use IEEE 754?. For more information continue to the next section

Types in C++ are generally mapped to hardware types for performance reasons. Therefore floating-point types will be whatever available on the CPU if it ever has an FPU. In modern computers IEEE-754 is the dominant format in hardware, and due to the requirements in C++ standard float and double must be mapped to at least IEEE-754 single and double precision respectively

Hardware support for types with higher precision is not common except on x86 and a few other rare platforms with 80-bit extended precision, therefore long double is usually mapped to the same type as double on those platforms. However recently long double is being slowly migrated to IEEE-754 quadruple precision in many compilers like GCC or Clang. Since that one is implemented with the built-in software library, performance is a lot worse. Depending on whether you favor faster execution or higher precision you're still free to choose whatever type long double maps to though. For example on x86 GCC has -mlong-double-64/80/128 and -m96/128bit-long-double options to set the padding and format of long double. The option is also available in many other architectures like the S/390 and zSeries

PowerPC OTOH by default uses a completely different 128-bit long double format implemented using double-double arithmetic and has the same range as IEEE-754 double precision. Its precision is slightly lower than quadruple precision but it's a lot faster because it can utilize the hardware double arithmetic. As above, you can choose between the 2 formats with the -mabi=ibmlongdouble/ieeelongdouble options. That trick is also used in some platforms where only 32-bit float is supported to get near-double precision

IBM z mainframes traditionally use IBM hex float formats and they still use it nowadays. But they do also support IEEE-754 binary and decimal floating-point types in addition to that

The format of floating-point numbers can be either base 16 S/390® hexadecimal format, base 2 IEEE-754 binary format, or base 10 IEEE-754 decimal format. The formats are based on three operand lengths for hexadecimal and binary: short (32 bits), long (64 bits), and extended (128 bits). The formats are also based on three operand lengths for decimal: _Decimal32 (32 bits), _Decimal64 (64 bits), and _Decimal128 (128 bits).

Floating-point numbers

Other architectures may have other floating-point formats, like VAX or Cray. However since those mainframes are still being used, their newer hardware version also include support for IEEE-754 just like how IBM did with their mainframes

On modern platforms without FPU the floating-point types are usually IEEE-754 single and double precision for better interoperability and library support. However on 8-bit microcontrollers even single precision is too costly, therefore some compilers support a non-standard mode where float is a 24-bit type. For example the XC8 compiler uses a 24-bit floating-point format that is a truncated form of the 32-bit format, and NXP's MRK uses a different 24-bit float format

Due to the rise of graphics and AI applications that require a narrower floating-point type, 16-bit float formats like IEEE-754 binary16 and Google's bfloat16 are also introduced to in many platforms and compilers also have some limited support for them, like __fp16 in GCC

score 1 · Answer 2 · edited Feb 01 '21 at 04:32

First of, I am new to Stack Overflow, so please bear with me.

However, to answer your question. Looking at the float.h headers, which specify floating point parameters for the:

Intel Compiler

//Float:
#define FLT_MAX                 3.40282347e+38F

//Double:
#define DBL_MAX                 1.7976931348623157e+308

//Long Double:
#if (__IMFLONGDOUBLE == 64) || defined(__LONGDOUBLE_AS_DOUBLE)
#define LDBL_MAX                    1.7976931348623157e+308L
#else
#define LDBL_MAX                1.1897314953572317650213E+4932L

GCC (MinGW actually gcc 4 or 5)

//Float:
#define FLT_MAX         3.40282347e+38F

//Double:
#define DBL_MAX     1.7976931348623157e+308

//Long Double: (same as double for gcc):
#define LDBL_MAX        1.7976931348623157e+308L

Microsoft

//Float:
#define FLT_MAX         3.40282347e+38F

//Double:
#define DBL_MAX     1.7976931348623157e+308

//Long Double: (same as double for Microsoft):
#define LDBL_MAX            DBL_MAX

So, as you can see only the Intel compiler provides 80-bit representation for long double on a "standard" Windows machine.

This data is copied from the respective float.h headers from a Windows machine.

AFAIK, C++Builder also has the 80 bit long double, at least on Win32, but not on Win64. — Rudy Velthuis, Jul 21 '16 at 17:50
This corresponds to the "typical" implementation for float and double, although long double is just double. That's legal, but does offer one difference from "typical" of 80 bit long double. — Mike, Jul 21 '16 at 17:51
The 80-bit floating point stuff probably has to do with Intel's x87 chips from the 70's and 80's, which implemented this in hardware. — robot1208, Jul 21 '16 at 22:12

score 0 · Answer 3 · edited Mar 24 '21 at 01:24

float and double are de-facto standardised on the IEEE single and double precision representations. I would put assuming these sizes in the same category as assuming CHAR_BIT==8. Some older ARM systems did have weird "mixed-endian" doubles, but unless you are working with retro stuff you are unlikely to encounter that nowadays.

long double on the other hand is far more variable. Sometimes it's IEEE double precision, sometimes it's 80-bit x87 extended, sometimes it's IEEE quad precision , sometimes it's a "double double" format made up from two IEEE double precision numbers added together.

So in portable code you can't rely on long double being any better than double.

What uncommon floating-point sizes exist in C++ compilers?

3 Answers3

Linked