Single precision floating point: Sign bit: 1 Exponent: 8 bits Mantissa: 23 bits Double precision floating point: Sign bit: 1 Exponent: 11 bits Mantissa: 52 bits
What does this information mean? I don't know English terms well.
Single precision floating point: Sign bit: 1 Exponent: 8 bits Mantissa: 23 bits Double precision floating point: Sign bit: 1 Exponent: 11 bits Mantissa: 52 bits
What does this information mean? I don't know English terms well.
A floating-point quantity (in most situations, not just C) is defined by three numbers: the sign, the significand (also called the "mantissa"), and the exponent. These combine to form a pseudo-real number of the form
sign × significand × 2exponent
This is similar to scientific notation, except that the numbers are all binary, and the multiplication is by powers of 2, not powers of 10.
For example, the number 4.000 can be represented as
+1 × 1 × 22
The number 768.000 can be represented as
+1 × 1.5 × 29
The number -0.625 can be represented as
-1 × 1.25 × 2-1
The number 5.375 can be represented as
+1 × 1.34375 × 22
In any particular floating-point format, you can have different numbers of bits assigned to the different parts. The sign is always 0 (positive) or 1 (negative), so you only ever need one bit for that. The more bits you allocate to the significand, the more precision you can have in your numbers. The more bits you allocate to the exponent, the more range you can have for your numbers.
For example, IEEE 754 single-precision floating point has a total of 24 bits of precision for the significand (which is, yes, one more than your table called out, because there's literally one extra or "hidden" bit). So single-precision floating point has the equivalent of log10(224) or about 7.2 decimal digits worth of precision. It has 8 bits for the exponent, which gives us exponent values of about ±127, meaning we can multiply by 2±127, giving us a decimal range of about ±1038.
When you start digging into the details of actual floating-point formats, there are a few more nuances to consider. You might need to understand where the decimal point (really the "binary point" or "radix point") sits with respect to the number that is the significand. You might need to understand the "hidden 1 bit", and the concept of subnormals. You might need to understand how positive and negative exponents are represented, typically by using a bias. You might need to understand the special representations for infinity, and the "not a number" markers. You can read about all of these in general terms in the Wikipedia article on Floating point, or you can read about the specifics of the IEEE 754 floating-point standard which most computers use.
Once you understand how binary floating-point numbers work "on the inside", some of their surprising properties begin to make sense. For example, the ordinary-looking decimal fraction 0.1 is not exactly representable! In single precision, the closest you can get is
+1 × 0x1.99999a × 2-4
or equivalently
+1 × 1.60000002384185791015625 × 2-4
or equivalently
+1 × 0b1.10011001100110011001101 × 2-4
which works out to about 0.10000000149. We simply can't get any more precise than that — we can't add any more 0's to the decimal equivalent — because the significand 1.10011001100110011001101
has completely used up our 1+23 available bits of single-precision significance.
You can read more about such floating point "surprises" at this canonical SO question, and this one, and this one.
Footnote: I said everything was based on "a pseudo-real number of the form sign × significand × 2exponent
, but strictly speaking, it's more like -1sign × significand × 2exponent
. That is, the 1-bit sign
component is 0 for positive, and 1 for negative.