4

The C++14 draft standard seems rather quiet about the specific requirements for float, double and long double, although these sizes seem to be common:

  • float: IEEE 32-bit floating-point representation (roughly 7 digits of precision, exponent range of 1e-38..1e+38)

  • double: IEEE 64-bit floating-point representation (roughly 16 digits of precision, exponent range of 1e-308..1e+308)

  • long double: 80-bit floating-point representation (roughly 19 digits of precision, exponent range of 1e-4951..1e+4932)

What C++ compilers and systems currently use floating-point sizes other than these?

I'm interested in longer, shorter, and non-binary representations using the standard types, not libraries, as my primary interest is portability of C++ programs.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Mike
  • 3,084
  • 1
  • 25
  • 44
  • 2
    This is a really interesting question, but I'm concerned it might not be a good fit for the Stack Overflow Q&A format because it doesn't have a single, definitive, best answer. That said, I'm still really curious about this! – templatetypedef Jul 21 '16 at 16:12
  • A quick search from the standard doc "There are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double". As you can see it only provides the minimum precision not the upper bound. – anurag-jain Jul 21 '16 at 16:15
  • In VS, double and long double offer the same. Refer https://msdn.microsoft.com/en-IN/library/s3f49ktz.aspx. – anurag-jain Jul 21 '16 at 16:23
  • Apparently some platforms/compilers can do 128-bit quadruple precision floating-points: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Computer-language_support – jpw Jul 21 '16 at 16:49
  • jpw: I've heard of references to 128 bit floating-point, and I also have heard there are implementations that use decimal floating-point rather than binary, but I don't know of any specific examples. Do you? – Mike Jul 21 '16 at 17:53
  • @Mike I haven't, but it's way outside my field of knowledge. The Wikipedia page mentions the Oracle Sun Studio compiler (or w/e it's called these days) and Intel doing 128-bit on PPC and Sparc, but as I said, this is outside my domain - I just thought the link might be interesting :) – jpw Jul 21 '16 at 18:46
  • @anurag-jain there's nothing strange about that. Most non-x86 architectures like ARM, MIPS, SPARC... don't have an extended precision floating-point type so they map `long double` to the same type as double. Only recently GCC is migrating `long double` to IEEE quadruple precision – phuclv Mar 16 '20 at 01:50

3 Answers3

4

It's unclear what "uncommon sizes" you're talking about

Here's the summary of most of the available floating-point formats. See also Do any real-world CPUs not use IEEE 754?. For more information continue to the next section


Types in C++ are generally mapped to hardware types for performance reasons. Therefore floating-point types will be whatever available on the CPU if it ever has an FPU. In modern computers IEEE-754 is the dominant format in hardware, and due to the requirements in C++ standard float and double must be mapped to at least IEEE-754 single and double precision respectively

Hardware support for types with higher precision is not common except on x86 and a few other rare platforms with 80-bit extended precision, therefore long double is usually mapped to the same type as double on those platforms. However recently long double is being slowly migrated to IEEE-754 quadruple precision in many compilers like GCC or Clang. Since that one is implemented with the built-in software library, performance is a lot worse. Depending on whether you favor faster execution or higher precision you're still free to choose whatever type long double maps to though. For example on x86 GCC has -mlong-double-64/80/128 and -m96/128bit-long-double options to set the padding and format of long double. The option is also available in many other architectures like the S/390 and zSeries

PowerPC OTOH by default uses a completely different 128-bit long double format implemented using double-double arithmetic and has the same range as IEEE-754 double precision. Its precision is slightly lower than quadruple precision but it's a lot faster because it can utilize the hardware double arithmetic. As above, you can choose between the 2 formats with the -mabi=ibmlongdouble/ieeelongdouble options. That trick is also used in some platforms where only 32-bit float is supported to get near-double precision

IBM z mainframes traditionally use IBM hex float formats and they still use it nowadays. But they do also support IEEE-754 binary and decimal floating-point types in addition to that

The format of floating-point numbers can be either base 16 S/390® hexadecimal format, base 2 IEEE-754 binary format, or base 10 IEEE-754 decimal format. The formats are based on three operand lengths for hexadecimal and binary: short (32 bits), long (64 bits), and extended (128 bits). The formats are also based on three operand lengths for decimal: _Decimal32 (32 bits), _Decimal64 (64 bits), and _Decimal128 (128 bits).

Floating-point numbers

Other architectures may have other floating-point formats, like VAX or Cray. However since those mainframes are still being used, their newer hardware version also include support for IEEE-754 just like how IBM did with their mainframes

On modern platforms without FPU the floating-point types are usually IEEE-754 single and double precision for better interoperability and library support. However on 8-bit microcontrollers even single precision is too costly, therefore some compilers support a non-standard mode where float is a 24-bit type. For example the XC8 compiler uses a 24-bit floating-point format that is a truncated form of the 32-bit format, and NXP's MRK uses a different 24-bit float format

Due to the rise of graphics and AI applications that require a narrower floating-point type, 16-bit float formats like IEEE-754 binary16 and Google's bfloat16 are also introduced to in many platforms and compilers also have some limited support for them, like __fp16 in GCC

phuclv
  • 37,963
  • 15
  • 156
  • 475
1

First of, I am new to Stack Overflow, so please bear with me.

However, to answer your question. Looking at the float.h headers, which specify floating point parameters for the:

  1. Intel Compiler

    //Float:
    #define FLT_MAX                 3.40282347e+38F
    
    //Double:
    #define DBL_MAX                 1.7976931348623157e+308
    
    //Long Double:
    #if (__IMFLONGDOUBLE == 64) || defined(__LONGDOUBLE_AS_DOUBLE)
    #define LDBL_MAX                    1.7976931348623157e+308L
    #else
    #define LDBL_MAX                1.1897314953572317650213E+4932L
    
  2. GCC (MinGW actually gcc 4 or 5)

    //Float:
    #define FLT_MAX         3.40282347e+38F
    
    //Double:
    #define DBL_MAX     1.7976931348623157e+308
    
    //Long Double: (same as double for gcc):
    #define LDBL_MAX        1.7976931348623157e+308L
    
  3. Microsoft

    //Float:
    #define FLT_MAX         3.40282347e+38F
    
    //Double:
    #define DBL_MAX     1.7976931348623157e+308
    
    //Long Double: (same as double for Microsoft):
    #define LDBL_MAX            DBL_MAX
    

So, as you can see only the Intel compiler provides 80-bit representation for long double on a "standard" Windows machine.

This data is copied from the respective float.h headers from a Windows machine.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Kolyan1
  • 151
  • 1
  • 7
  • AFAIK, C++Builder also has the 80 bit long double, at least on Win32, but not on Win64. – Rudy Velthuis Jul 21 '16 at 17:50
  • This corresponds to the "typical" implementation for float and double, although long double is just double. That's legal, but does offer one difference from "typical" of 80 bit long double. – Mike Jul 21 '16 at 17:51
  • The 80-bit floating point stuff probably has to do with Intel's x87 chips from the 70's and 80's, which implemented this in hardware. – robot1208 Jul 21 '16 at 22:12
  • The IBM compiler has 80-bit doubles. – user207421 Mar 21 '20 at 19:35
0

float and double are de-facto standardised on the IEEE single and double precision representations. I would put assuming these sizes in the same category as assuming CHAR_BIT==8. Some older ARM systems did have weird "mixed-endian" doubles, but unless you are working with retro stuff you are unlikely to encounter that nowadays.

long double on the other hand is far more variable. Sometimes it's IEEE double precision, sometimes it's 80-bit x87 extended, sometimes it's IEEE quad precision , sometimes it's a "double double" format made up from two IEEE double precision numbers added together.

So in portable code you can't rely on long double being any better than double.

phuclv
  • 37,963
  • 15
  • 156
  • 475
plugwash
  • 9,724
  • 2
  • 38
  • 51