0

I want to confirm it that the significand representation in double data type in C is always a fraction between 0 and 2 with a 2^52 precision as per the IEEE 754 standard.

Here is where I read that: https://stackoverflow.com/questions/30052710/why-double-can-store-bigger-numbers-than-unsigned-long-long#:~:text=The%20reason%20is%20that%20unsigned,10308)%20but%20not%20exactly.

  • 2
    For normalized numbers, the significand is in [1,2), with 53 bits of precision. For subnormals, the significand is in [0,1), with less than 53 bits of precision. – Steve Summit May 22 '22 at 21:34
  • See [bitwise splitting the mantissa of a IEEE 754 double? how to access bit structure](https://stackoverflow.com/questions/69207580). – Steve Summit May 22 '22 at 21:42
  • 2
    I said "For normalized numbers, the significand is in [1,2), with 53 bits of precision", but that's kind of misleading. Since one of those bits (the implicit one) never changes, there are "only" 2^52 distinct normalized values. But then there are 2^52 more subnormal values, so on the whole, we say that IEEE-754 doubles have 53 significant bits of precision. – Steve Summit May 22 '22 at 21:58
  • More precisely, the subnormal significands go from `0.0000000000000000000000000000000000000000000000000000` to `0.1111111111111111111111111111111111111111111111111111`, while the normals are from `1.0000000000000000000000000000000000000000000000000000` to `1.1111111111111111111111111111111111111111111111111111`. – Steve Summit May 22 '22 at 22:04
  • @SteveSummit I thought that the subnormals range from 0.0000000000000000000000000000000000000000000000000000 to 0.9999999999999999999999999999999999999999999999999999 and the normals range from 1.0000000000000000000000000000000000000000000000000000 to 1.9999999999999999999999999999999999999999999999999999. – Ahmed Alagha May 22 '22 at 22:16
  • 2
    Sorry, the four numbers in the "more precisely" comment are all supposed to be interpreted in base 2. In base 10, the largest subnormal is exactly 0.9999999999999997779553950749686919152736663818359375, and the largest normal is the same fraction but with a leading 1. – Steve Summit May 22 '22 at 22:17
  • @SteveSummit well this now leads to a very important question. Why such random numbers appear? Is it something related to the compiler or the hardware registers of the microprocessor or both of them? – Ahmed Alagha May 22 '22 at 22:25
  • Do you mean, why do the decimal representations of those limits look so weird? It's because the IEEE-754 binary64 format is *not* a decimal format — down inside, it's strictly binary. – Steve Summit May 22 '22 at 22:38
  • Consider integers: The number 2^32 in hexadecimal is `0x100000000` — a nice, pretty number. But in decimal, it's 4294967296, which looks pretty random. – Steve Summit May 22 '22 at 22:39
  • 2
    N.B. The C standard does not require floating point types to be represented according to the IEEE 754 standard. – chtz May 22 '22 at 22:41

1 Answers1

3

I want to confirm it that the significand representation in double data type in C is always a fraction between 0 and 2 with a 2^52 precision as per the IEEE 754 standard.

No, it is not. The C standard does not require that conforming C implementations use the IEEE-754 binary64 format (also called “double precision”) for the double type.

The C standard describes floating-point numbers as having significands in [0, 1), because it uses a form in which all the significant digits are to the right of the radix point. However, this is simply a matter of scaling and is mathematically equivalent to the more commonly used form in which the first digit is to the left of the radix point. Since this form is suggested by the interval you ask about, [0, 2), this answer uses that form.

Also, “the significand representation” is different from “the significand.” The mathematical significand of an IEEE-754 binary64 number is in [0, 2) (the interval including 0 but excluding 2) for finite numbers. (And it is in [0, 1) for subnormal numbers and [1, 2) for normal numbers.) However, the significand representation is a string of 52 bits combined with one bit from the exponent field. (That bit from the exponent field is 0 if the exponent field is all zeros and 1 if the exponent field is neither all zeros nor all ones. If it is all ones, the significand is not applicable because it is representing an infinity or NaN.)

Further, +∞ and −∞ (plus and minus infinity) are representable numbers in the binary64 format but do not have significands in [0, 2). And the format also provides for representing NaN (Not a Number), which does not have a significand value (although the significand field may provide useful information).

If a C implementation uses IEEE-754 binary64 for double (or any format using base two), and the “value” of a double is a finite number, then its significand is in [0, 2).

Also note that “2^52 precision” is not a good term, at least not without some definition of what that means. What is 3 precision or 8 precision? A number by itself has little meaning. 252 is the ratio between the position values of the most and least significant bits in the significand, although subnormal numbers are unable to maintain that span.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312