Representing floating-point underflow in program output

Question

I have done a thorough search but only seem to find explanations of what underflow is, and how it works, rather than ways to represent it.

Long story short, I'm writing a program to experiment with integer overflow, floating-point overflow (inf) and underflow -- and outputting the effects of these conditions using the printf function. I've been successful with the first two, but can't seem to successfully represent a subnormal number as a result of floating-point underflow. To my knowledge, I know that, depending on how the system handles the condition, a symptom of underflow could be a loss of a digit in the output, or a rounding-off to zero.

In an attempt to find the lowest floating-point values, I looked at the float.h header file. FLT_MIN comes up as 0.000000 (thus unworkable in demonstrating an operation that leads to underflow), and DEC32_MIN (which I believe holds the smallest possible normalized positive value) keeps being flagged by the compiler as 'undefined' despite the float.h header file being included, which is rather unnerving. After which, I searched for the smallest possible normalized non-zero 32-bit floating-point value on the internet, experimented with dividing them in all manner of ways; and yet my system still seems to represent it in the same format, seemingly completely avoiding an underflow.

I know it seems rather far-fetched that I'm asking to deliberately cause an error and represent it accurately, but it is for educational purposes.

My system handles floats as 32 bits, doubles as 64 bits and long doubles as 128 bits. May as well mention that.

The question is; how can I create an underflow with float, double and long double types and represent it in the output so it is visibly an underflow error?

Alongside an answer, any help in regards to explaining floating-point underflow further would be very much appreciated. I am rather new to C and programming as a whole.

Thank you,

GS

Subnormal numbers will round to 0.000000 if you only print 6 decimals. Otherwise look into functions that will access the floating point status flags to detect underflow. See http://linux.die.net/man/3/feclearexcept — Mats, Feb 18 '16 at 14:06
This could be construed as pedantry, but can we limit answers to IEEE754 floating point types? If so, could you amend the question to this effect? And I don't understand the downvote. — Bathsheba, Feb 18 '16 at 14:08
128-bit long doubles?! That's pretty cool. What OS and hardware is this? — Mark Dickinson, Feb 18 '16 at 14:09
SPARC machines have quadruple precision floating point. (And plenty of code mis-uses it ;-)) — Bathsheba, Feb 18 '16 at 14:29
@EOF -- I assigned FLT_MIN to a variable, and printed it with a %f specifier. I tried adding modifiers, such as .12 and so forth, seemingly ended up as all 0's. Shall I experiment with more modifiers? — Gabriel Saul, Feb 18 '16 at 14:32
@Bathsheba -- I know very little about IEEE standardization as I've just finished my first in-depth chapter on floating-point data types. I'll look into it and amend it accordingly. — Gabriel Saul, Feb 18 '16 at 14:38
@MarkDickinson -- if you're still curious about OS and hardware, I'm using a Toshiba SATELLITE laptop running 64-bit Ubunutu with a dual-core Intel® Celeron(R) CPU N2830 @ 2.16GHz processor. Nothing special. sizeof(long double)*8 returns 128, that's all I know. — Gabriel Saul, Feb 18 '16 at 14:38
@GabrielSaul: DId you trry printing with `printf("%g", (double)num);`? `FLT_MIN` can easily be around ~`pow(10, -38)` or so, you'd need ~38 digits of precision for `%f`. — EOF, Feb 18 '16 at 14:45
@GabrielSaul: Ah, right; those aren't 128-bit floating-point, then; they're likely the 80-bit x87 extended precision format (but padded with 6 zero bytes each for alignment reasons). — Mark Dickinson, Feb 18 '16 at 14:46
@EOF -- (double)num would be explicit casting, right? What does that do exactly? And what is the %g specifier? I'll try that out. Haven't used explicit casting or the %g specifier yet, but it's always good to learn new things when solving problems. — Gabriel Saul, Feb 18 '16 at 14:50
@Bathsheba: re SPARC: I believe that's not IEEE binary128 format, but instead a rather horrible double-double format with peculiar numerical properties. — Mark Dickinson, Feb 18 '16 at 14:50
@GabrielSaul: `printf("%f/g/...")` expects an argument of type `double`. For normal functions (with prototypes), you can rely on automatic conversions of arguments, but varargs-functions (like `printf()`) only do default argument conversion, so I'd try getting into the habit of making *sure* that the argument is of the correct type, because if the default promotion doesn't produce the right type, you've got undefined behavior. — EOF, Feb 18 '16 at 14:54
@EOF: So overall, make sure to explicit cast when using a function that has a default conversion? I didn't bear that in mind actually, and I'd been learning about how C handles floating-point constants as doubles, plus how printf converts data types for efficiency. Thanks. I'll try that out. I'd been using %f for floats and doubles (I'm learning from C Plus Primer mainly); had no idea about the %g specifier. — Gabriel Saul, Feb 18 '16 at 15:03
@GabrielSaul: For *this* case, you don't need the `(double)`-cast, since `float` is default-promoted to `double`, but if you ever pass an integer to `printf("%f")`, you're in trouble, whereas passing an integer to a function like `fabs()` is fine, provided the prototype for `fabs()` is available to the call-expression. — EOF, Feb 18 '16 at 15:09
Suggest [To print a value and see all it significant digits](http://stackoverflow.com/q/16839658/2410359) — chux - Reinstate Monica, Feb 18 '16 at 16:01
@EOF: Right, so using (double)float_MN with %g represented a non-zero value at least (1.17549e-38). Dividing this value by 10 and printing it again with the %g and (double) -cast simply came up as 1.17549e-39. What operation can I use on this value to create an underflow? — Gabriel Saul, Feb 18 '16 at 16:01
To create underflow, start with 1.0 and continually divide by `FLT_RADIX`, (2) until result is 0.0. — chux - Reinstate Monica, Feb 18 '16 at 16:03
Certainly. `double x=1.0; double y; while (x) { y = x; x /= FLT_RADIX; } printf("%e\n", y);` — chux - Reinstate Monica, Feb 18 '16 at 16:13
@chux: Typed up and executed that loop. Outputted 4.940656e-324 on the last line. — Gabriel Saul, Feb 18 '16 at 16:27
So diving that number by 2 results in an underflow (as the quotient is 0.0). If you need something else, please specify. — chux - Reinstate Monica, Feb 18 '16 at 16:33
@chux: I understand now. I managed to look at what was happening throughout the loop by putting a printf statement with the variables printed. I know now what operations I can do, and how to represent underflow in effect via a loop. Thank you, and thank you to everyone else for all the advice and help. — Gabriel Saul, Feb 18 '16 at 16:44
@chux: Although, I have one last small question. At what point do subnormal numbers begin to emerge? Isn't a subnormal number also a result of underflow? — Gabriel Saul, Feb 18 '16 at 16:46
`DBL_MIN` (smallest normal) `DBL_TRUE_MIN` (smallest subnormal) Possible the same if FP does not support sub-normals. Q; Isn't a subnormal number also a result of underflow? --> no, but it is a gradual loss of precession. `4.940656e-324-->0.0` is total loss of precision (underflow). — chux - Reinstate Monica, Feb 18 '16 at 17:26

Representing floating-point underflow in program output

0 Answers0