Disclaimer
This is an attempt to provide an easy to understand explanation about how the floating point encoding works. It is a simplification and it does not cover any of the technical aspects of the real IEEE 754 floating point standard (normalization, signed zero, infinities, NaNs, rounding etc). However, the idea presented here is correct.
Understanding how the floating point numbers work is severely impeded by the fact that computers work with numbers in base 2
while the humans don't easily handle them. I'll try to explain how the floating point numbers work using base 10
.
Let's construct a floating point number representation using signs and base 10
digits (i.e. the usual digits from 0
to 9
we are using on a daily basis).
Let's say we have 10
square cells and each cell can hold either a sign (+
or -
) or a decimal digit (0
, 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
or 9
).
We can use the 10 digits to store signed integer numbers. One digit for the sign and 9 digits for the value:
sign -+ +-------- 9 decimal digits -----+
v v v
+---+---+---+---+---+---+---+---+---+---+
| + | 0 | 0 | 0 | 0 | 0 | 1 | 5 | 0 | 0 |
+---+---+---+---+---+---+---+---+---+---+
This is how value 1500
is represented as an integer.
We can also use them to store floating point numbers. For example, 7 digits for mantissa and 3 digits for exponent:
+------ sign digits --------+
v v
+---+---+---+---+---+---+---+---+---+---+
| + | 0 | 0 | 0 | 1 | 5 | 0 | + | 0 | 1 |
+---+---+---+---+---+---+---+---+---+---+
|<-------- Mantissa ------->|<-- Exp -->|
This is one of the possible representations of 1500
as floating point value (using our 10 decimal digits representation).
The value of mantissa (M
) is +150
, the value of exponent (E
) is +1
. The value represented above is:
V = M * 10^E = 150 * 10^1 = 1500
The ranges
The integer representation can store signed values between -(10^9-1)
(-999,999,999
) and +(10^9-1)
(+999,999,999
). More, it can represent each and every integer value between these limits. Even more, there is a single representation for each value and it is exact.
The floating point representation can store signed values for mantissa (M
) between -999,999
and +999,999
and for exponent (E
) between -99
and +99
.
It can store values between -999,999*10^99
and +999,999*10^99
. These numbers have 105
digits, much more than the 9
digits of the biggest numbers represented as integers above.
The loose of precision
Let's remark that for integer values, M
stores the sign and the first 6 digits of the value (or less) and E
is the number of digits that did not fit into M
.
V = M * 10^E
Let's try to represent V = +987,654,321
using our floating point encoding.
Because M
is limited to +999,999
it can only store +987,654
and E
will be +3
(the last 3 digits of V
cannot fit in M
).
Putting them together:
+987,654 * 10^(+3) = +987,654,000
This is not our original value of V
but the best approximation we can get using this representation.
Let's remark that all the numbers between (and including) +987,654,000
and +987,654,999
are approximated using the same value (M=+987,654, E=+3
). Also there is no way to store decimal digits for numbers greater than +999,999
.
As a general rule, for numbers bigger than the maximum value of M
(+999.999
), this method produces the same representation for all values between +999,999*10^E
and +999,999*10^(E+1)-1
(integer or real values, it doesn't matter).
Conclusion
For large values (larger than the maximum value of M
), the floating point representation has gaps between the numbers it can represent. These gaps become bigger and bigger as the value of E
increases.
The entire idea of the "floating point" is to store a dozen or so of the most representative digits (the beginning of the number) and the magnitude of the number.
Let's take the speed of light as an example. Its value is about 300,000 km/s
. Being so massive, for most practical purposes you don't care if it's 300,000.001 km/s
or 300,000.326 km/s
.
In fact, it is not even that big, a better approximation is 299,792.458 km/s
.
The floating point numbers extract the important characteristics of the speed of light: its magnitude is of hundreds of thousands of km/s (E=5
) and its value is 3
(hundred of thousands km/s).
speed of light = 3*10^5 km/s
Our floating point representation can approximate it by: 299,792 km/s
(M=299,792
, E=0
).