A look at the top search results for “gradual underflow” does not show a clear and direct answer, so:
IEEE-754 binary floating-point numbers have a regular pattern through most of their range: There is a sign, an exponent, and a significand with a set number of bits (24 for 32-bit float
, 53 for 64-bit double
). However, the pattern must be interrupted at the ends. At the high end, results too large for the largest exponent are changed to infinity. At the low end, we have a choice.
One choice would be that if the result is lower than the lowest exponent, the result is rounded to zero. However, IEEE-754 uses a different scheme called gradual underflow. The lowest exponent is reserved for a different format than is used for regular exponents.
With a normal exponent, the 24-bit significand is “1.” followed by 23 bits that are encoded in the significand field. When the number is subnormal, the exponent has the same value as the lowest regular exponent, but the 24-bit significand is “0.” followed by 23 bits. This is gradual underflow because, as numbers get smaller, they have less and less precision (more of the leading bits in the significand are zero) before we reach zero.
Gradual underflow has some nice mathematical properties, notably that a-b == 0
if and only if a == b
. With sudden underflow, it would be possible that that a-b == 0
even if a
and b
are different, because a-b
is too small to be represented in the floating-point format. With gradual overflow, all possible values of a-b
, for small a
and b
, are representable because they are just differences in that significand with the lowest exponent.
Another issue in determining whether floating-point underflow has occurred is that implementations are permitted (by the IEEE-754 standard) to report underflow based on a test either before or after rounding. When calculating a result, a floating-point implementation effectively has to do these steps:
- Calculate the sign, exponent, and significand of the exact result.
- Round the result to fit within the floating-point format. If the significand rounds up, this may increase the exponent.
The standard allows the implementation to report underflow with either:
- Calculate the sign, exponent, and significand of the exact result.
- Is the exponent smaller than the normal range? If so, report underflow.
- Round the result to fit within the floating-point format. If the significand rounds up, this may increase the exponent.
or:
- Calculate the sign, exponent, and significand of the exact result.
- Round the result to fit within the floating-point format. If the significand rounds up, this may increase the exponent.
- Is the exponent smaller than the normal range? If so, report underflow.
Thus, two different implementations of floating-point may return different reports about underflow for the same calculation.
(There are some additional rules about handling underflow. The above causes an underflow exception to be signaled. However, if traps from this exception are not enabled and the result is exact [rounding did not change anything], then the “underflow” is ignored, and the underflow flag is not raised. If the result is inexact, then the underflow is raised and an inexact exception is signaled.)