8

According to wikipedia, the layouts of the different precision data types are

I wrote a small program to output the numerical limits for float, double and long double in C++ (compiled with g++)

#include<iostream>
#include<limits>
#include<string>

template<typename T>
void print(std::string name) {
    std::cout << name << " (" << sizeof(T) * 8 << "): " << std::numeric_limits<T>::epsilon() << "\t"  <<  std::numeric_limits<T>::min() << "\t" <<  std::numeric_limits<T>::max() << std::endl;
}

int main() {
    std::cout.precision(5);
    print<float>("float");
    print<double>("double");
    print<long double>("long double");
    return 0;
}

which outputs (I have run it on multiple machines with the same result)

float (32): 1.1921e-07  1.1755e-38  3.4028e+38
double (64): 2.2204e-16 2.2251e-308 1.7977e+308
long double (128): 1.0842e-19   3.3621e-4932    1.1897e+4932

The upper limits coincide with 2^(2^(e-1)) and for float and double, epsilon coincides with 2^(-f). For long double, however epsilon should be roughly 1.9259e-34 by that logic.

Does anyone know, why it isn't?

okruz
  • 95
  • 3

1 Answers1

6

long double is not guaranteed to be implemented as IEEE-745 quadruple precision. C++ reference reads:

long double - extended precision floating point type. Does not necessarily map to types mandated by IEEE-754. Usually 80-bit x87 floating point type on x86 and x86-64 architectures.

If long double is implemented as 80-bits x86 extended precision, then epsilon is 2-63 = 1.0842e-19. This is the value you get as the output.

Some compilers support __float128 type that has quadruple precision. In GCC long double becomes an alias for __float128 if -mlong-double-128 command line option is used, and on x86_64 targets __float128 is guaranteed to be IEEE quadruple precision type (implemented in software).

std::numeric_limits is not specialized for __float128. To get the value of epsilon the following trick can be used (assuming a little-endian machine):

__float128 f1 = 1, f2 = 1;      // 1.q       -> ...00000000
std::uint8_t u = 1;
std::memcpy(&f2, &u, 1);        // 1.q + eps -> ...00000001
std::cout << double(f2 - f1);   // Output: 1.9259e-34

With GCC you can use libquadmath:

#include <quadmath.h>
...

std::cout << (double)FLT128_EPSILON;

to get the same output.

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
Evg
  • 25,259
  • 5
  • 41
  • 83
  • Thanks! I guess I was irritated by the fact that the long double takes up 128 bits. – okruz Nov 28 '19 at 11:31
  • 1
    Your sentence about x86_64 GCC guarantee seems either vague or incorrect. In particular, if you mean that `long double` is guaranteed to be IEEE `binary128` on x86_64 by default, then it's wrong. There's no hardware support for `binary128` on x86_64, so it can't be a default builtin type on GCC. – Ruslan Nov 28 '19 at 12:58
  • @Ruslan, I neither mean that `__float128` has hardware support, nor that `long double` is the same as `__float128` by default. GCC documentation reads: "Support for __float128 (TFmode) IEEE quad type ... is available via the soft-fp library on x86_64 targets.". I edited the answer to clarify it. – Evg Nov 28 '19 at 13:19
  • 1
    I've been puzzled just as much as @Ruslan: What does *'it'* refer to? To the last subject occurring before. Unfortunately, that has been *'long double'* (*'[...] becomes an alias [...]'*). Hope you don't mind me having fixed ;) – Aconcagua Nov 28 '19 at 18:37