Denormalized Numbers - IEEE 754 Floating Point

Question

So I'm trying to learn more about Denormalized numbers as defined in the IEEE 754 standard for Floating Point numbers. I've already read several articles thanks to Google search results, and I've gone through several StackOverFlow posts. However I still have some questions unanswered.

First off, just to review my understanding of what a Denormalized float is:

Numbers which have fewer bits of precision, and are smaller (in magnitude) than normalized numbers

Essentially, a denormalized float has the ability to represent the SMALLEST (in magnitude) number that is possible to be represented with any floating point value.

Does that sound correct? Anything more to it than that?

I've read that:

using denormalized numbers comes with a performance cost on many platforms

Any comments on this?

I've also read in one of the articles that

one should "avoid overlap between normalized and denormalized numbers"

Any comments on this?

In some presentations of the IEEE standard, when floating point ranges are presented the denormalized values are excluded and the tables are labeled as an "effective range", almost as if the presenter is thinking "We know that denormalized numbers CAN represent the smallest possible floating point values, but because of certain disadvantages of denormalized numbers, we choose to exclude them from ranges that will better fit common use scenarios" -- As if denormalized numbers are not commonly used.

I guess I just keep getting the impression that using denormalized numbers turns out to not be a good thing in most cases?

If I had to answer that question on my own I would want to think that:

Using denormalized numbers is good because you can represent the smallest (in magnitude) numbers possible -- As long as precision is not important, and you do not mix them up with normalized numbers, AND the resulting performance of the application fits within requirements.

Using denormalized numbers is a bad thing because most applications do not require representations so small -- The precision loss is detrimental, and you can shoot yourself in the foot too easily by mixing them up with normalized numbers, AND the peformance is not worth the cost in most cases.

Any comments on these two answers? What else might I be missing or not understand about denormalized numbers?

See this question for an in-depth discussion of denormals and dealing with them: http://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x — fig, Feb 26 '14 at 14:41

Jeffrey Sax · Accepted Answer · 2013-02-28T18:01:24.640

Essentially, a denormalized float has the ability to represent the SMALLEST (in magnitude) number that is possible to be represented with any floating point value.

That is correct.

using denormalized numbers comes with a performance cost on many platforms

The penalty is different on different processors, but it can be up to 2 orders of magnitude. The reason? The same as for this advice:

one should "avoid overlap between normalized and denormalized numbers"

Here's the key: denormals are a fixed-point "micro-format" within the IEEE-754 floating-point format. In normal numbers, the exponent indicates the position of the binary point. Denormal numbers contain the last 52 bits in the fixed-point notation with an exponent of 2^-1074 for doubles.

So, denormals are slow because they require special handling. In practice, they occur very rarely, and chip makers don't like to spend too many valuable resources on rare cases.

Mixing denormals with normals is slow because then you're mixing formats and you have the additional step of converting between the two.

I guess I just keep getting the impression that using denormalized numbers turns out to not be a good thing in most cases?

Denormals were created for one primary purpose: gradual underflow. It's a way to keep the relative difference between tiny numbers small. If you go straight from the smallest normal number to zero (abrupt underflow), the relative change is infinite. If you go to denormals on underflow, the relative change is still not fully accurate, but at least more reasonable. And that difference shows up in calculations.

To put it a different way. Floating-point numbers are not distributed uniformly. There are always the same amount of numbers between successive powers of two: 2⁵² (for double precision). So without denormals, you always end up with a gap between 0 and the smallest floating-point number that is 2⁵² times the size of the difference between the smallest two numbers. Denormals fill this gap uniformly.

As an example about the effects of abrupt vs. gradual underflow, look at the mathematically equivalent x == y and x - y == 0. If x and y are tiny but different and you use abrupt underflow, then if their difference is less than the minimum cutoff value, their difference will be zero, and so the equivalence is violated.

With gradual underflow, the difference between two tiny but different normal numbers gets to be a denormal, which is still not zero. The equivalence is preserved.

So, using denormals on purpose is not advised, because they were designed only as a backup mechanism in exceptional cases.

Mixing subnormal values with normal values does not have any performance penalty greater than using subnormal values alone on any processor I am familiar with. — Eric Postpischil, Feb 28 '13 at 20:23
While subnormal values are rare, there are applications in which they arise more frequently than desired. One that is not greatly rare is echo effects (and other signal processing filters) when audio input ceases. Without new inputs to keep values in the normal range, the residual values in the filter decrease over time and reach the subnormal range. — Eric Postpischil, Feb 28 '13 at 20:25
I like the example with `x == y` versus `x - y == 0`. Is it true with IEEE floating-point numbers (if we disregard not-a-number and infinity) that subtraction has this property? I mean that `x - y == 0` **only if** `x == y`. — Jeppe Stig Nielsen, Mar 02 '13 at 22:15
@JeppeStigNielsen Yes, it is true for all IEEE floating-point numbers, with the only exceptions `x,y = +inf,+inf` and `x,y = -inf,-inf`. It works if either `x` or `y` is NaN, since in that case `x - y` is NaN so `x - y == 0` is false, and also `x == y` is false by definition. — Jeffrey Sax, Mar 03 '13 at 02:38
From what I understand, the intention with IEEE-754 was that 32-bit and 64-bit floating-point values would be unpacked into a format which did not presume a leading "1", and operations would be performed using that format. Once a sequence of operations was done, the result of that sequence would then be converted back to a 32-bit or 64-bit format. In the common scenarios where multiple operations would be chained together, this approach would minimize the overhead of using denormals, since the intermediate-value format didn't have to do anything special with them. — supercat, Apr 08 '15 at 23:39
The IEEE's intended usage pattern would have been great if languages had made the intermediate type available to programmers. Unfortunately, when ANSI C added "long double" it failed to provide a means by which a variable-argument method could specify in its prototype what kind of floating-point numbers it wanted, which would have been a prerequisite to cleanly supporting any floating-point type longer than "double" without creating compatibility problems. As a consequence, many compiler vendors decided the way to avoid compatibility problems was to simply let programmers the type needed... — supercat, Apr 08 '15 at 23:47

Denormalized Numbers - IEEE 754 Floating Point

1 Answers1

Linked

Related