How can the significand width in bits of float, double be determined: Is there a standard definition?

Question

Is there a standard manner to determine the width of the significand of a double, in C or C++? I am aware that IEEE-754 format of a double stores the significand in 53 bits, but I would like to avoid using a “magic” number in my code.

On Linux, the file usr/include/ieee754.h exists, but it describes the format using bit fields in a structure, which I cannot determine the size of (at compile time).

A Linux-only solution is acceptable.

Based on *"... mantissa in 53 bits ..."* and *"... determine the size (at compile time)"*, I believe OP is seeking a define or constant for the size of the mantissa in bits. — jww, Jul 03 '19 at 18:45
@jww Yes, I _am_ looking for a define or constance for the size of the signifcand (mantissa) in bits. — Jamie, Jul 04 '19 at 19:56

score 2 · Accepted Answer · answered Jul 03 '19 at 15:06

2

Use FLT_MANT_DIG and DBL_MANT_DIG, defined in <float.h>:

#include <float.h>
#include <stdio.h>


#if FLT_RADIX != 2
    #error "Floating-point base is not two."
#endif


int main(void)
{
    printf("There are %d bits in the significand of a float.\n",
        FLT_MANT_DIG);
    printf("There are %d bits in the significand of a double.\n",
        DBL_MANT_DIG);
}

answered Jul 03 '19 at 15:06

Eric Postpischil

195,579
13
168
312

1

But note that knowing how many bits does not tell you which ones they are. Also, the OP may need to account for the leading 1 bit that is implicit in normalized IEEE-754 representations. – John Bollinger Jul 03 '19 at 15:17

John Bollinger · Answer 2 · 2019-07-03T16:07:44.440

Is there a standard manner to determine the mantissa of a double?

You're willing to accept a Linux-specific solution, but you claim that glibc's ieee754.h header does not satisfy your needs, so I conclude that the problem you are trying to solve is not extracting or conveying the bits themselves, as that header's union ieee_double would provide a means for you to do that.

I read "the mantissa" as a different thing from "the number of bits of mantissa", so I conclude that DBL_MANT_DIG of float.h is not what you're looking for, either.

The only other thing I can think of that you might mean is the value of the significand (mantissa), according to the standard floating point model:

v = (sign) * significand * radix^exponent

The frexp() function, in the C language standard since C99, serves this purpose.¹ It separates a double into an exponent (of 2) and a significand, represented as a double. For a finite, nonzero input, the absolute value of the result is in the half-open interval [0.5, 1).

Example:

#include <math.h>
#include <stdio.h>

void print_parts(double d) {
    int exp;
    double significand = frexp(d, &exp);

    printf("%e = %f * 2^%d\n", d, significand, exp);
}

Sample outputs:

7.256300e+16 = 0.503507 * 2^57
1.200000e-03 = 0.614400 * 2^-9
-0.000000e+00 = -0.000000 * 2^0

Note that although the example function does not print sufficient decimal digits to convey the significands exactly, frexp() itself is exact, not subject to any rounding errors.

¹ Technically, frexp() serves the purpose provided that FLT_RADIX expands to 2. It is well-defined in any case, but if your double representation uses a different radix then the result of frexp(), though well-defined, is probably not what you're looking for.

I apologise that you were forced to determine what I was asking for: I don't want to have to assume that the bit width of the mantissa is 53 bits. I'd rather have a macro that tells my code: "_the bit width of the mantissa for a double is DBL_MANT_BIT_WIDTH_" (say). — Jamie, Jul 04 '19 at 19:48
Well that would be the `DBL_MANT_DIG` macro from `float.h` that both answers reference. Do choose Eric's answer over mine, though, as he was first, guessed correctly what you wanted, and even provided a nice example demonstrating it. — John Bollinger, Jul 05 '19 at 03:52

score 2 · Answer 3 · answered Jul 05 '19 at 14:44

In C++ you might use std::numeric_limits<double>::digits and std::numeric_limits<float>::digits:

#include <limits>
#include <iostream>

int main()
{
    std::cout << std::numeric_limits<float>::digits << "\n";
    std::cout << std::numeric_limits<double>::digits << "\n";
}

prints

24
53

respectively.

How can the significand width in bits of float, double be determined: Is there a standard definition?

3 Answers3

Linked

Related