float and double
(I'll explain float
, that is IEEE-754 Single-precision floating-point format, but double
, that is IEEE-754 Double-precision floating-point format is the same but with bigger numbers.
In general you can imagine a float
to be:
mantissa₂ * (2 ^ exponent₂)
where mantissa₂ means mantissa in base two, and exponent₂ means exponent in base two
The mantissa₂ is 23 bits, the exponent₂ 8 bits. There is an extra bit for the sign, and the exponent₂ has a special format with special range that we will see much below
There is another trick: floating points are normally saved in "normalized" form:
1₂ mantissa₂ * (2 ^ exponent₂)
so the first digit is always 1₂, and so there is a 1₂ plus 23 binary digits for the mantissa₂, so a total of 24 digits for the complete mantissa₂.
Now, with 24 bits you can have numbers between 0 and 16,777,216, that is 7 full digits plus the 8th that is "partial" (you can't have 26,777,216 for example). In fact log₁₀ 2^24 = 7.22471989594
The exponent "moves" a floating decimal point, so that you can have, for example
1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂ . 1₂ (there are a total of 24 binary digits 1, I hope... I counted them)
or
1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂ . 1₂1₂
or
1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂0₂
or
1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂1₂0₂0₂
and so on.
The exponent₂ has three ranges: [-1;-127]
, [1;127]
, 0
for denormalized numbers and 255
for NaN and Infinite (where 255
means that all the bits of the exponent are at 1
)
In the range [-1;-127]
the decimal point is moved to the left, for a number of steps equal to the range, in the range [1;127]` the decimal point is moved to the right in the same way.
If the exponent is 0
, the number is "denormalized". They are ugly floating point numbers that have special handling and are slower for this reason. When the number is "denormalized" then there is no implicit 1₂ at the beginning of the number, so you only have 23 bits of mantissa, that is 6 dot something digits of precision (log₁₀ 2^23 = 6.9236899)
Can't explain how the 9 digits of precision come out.
decimal
With decimal
it is easy: the format is:
mantissa₂ / (10 ^ exponent₂)
where mantissa₂ is 96 bits, exponent₂ is 5 bits (a little less, the range is [0;28]
), plus there is a sign bit, and many unused bits. The exact format is written in the reference source. In decimal
s there is no implicit initial 1₂, so it is pure 96 bits, and log₁₀ 2^96 = 28.8988795837, so 28 or 29 digits.