80-bit floating point and subnormal numbers

Question

I am trying to convert an 80-bit extended precision floating point number (in a buffer) to double. The buffer basically contains the content of an x87 register.

This question helped me get started as I wasn't all that familiar with the IEEE standard. Anyway, I am struggling to find useful info on subnormal (or denormalized) numbers in the 80-bit format. What I know is that unlike float32 or float64 it doesn't have a hidden bit in the mantissa (no implied addition of 1.0), so one way to know if a number is normalized is to check if the highest bit in the mantissa is set. That leaves me with the following question:

From what wikipedia tells me, float32 and float64 indicate a subnormal number with a (biased) exponent of 0 and a non-zero mantissa.

What does that tell me in an 80-bit float?
Can 80-bit floats with a mantissa < 1.0 even have a non-zero exponent?
Alternatively, can 80-bit floats with an exponent of 0 even have a mantissa >= 1.0?

EDIT: I guess the question boils down to:

Can I expect the FPU to sanitize exponent and highest mantissa bit in x87 registers?

If not, what kind of number should the conversion result in? Should I ignore the exponent altogether in that case? Or is it qNaN?

EDIT:

I read the FPU section in the Intel manual (Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1: Basic Architecture) which was less scary than I had feared. As it turns out the following values are not defined:

exponent == 0 + mantissa with the highest bit set
exponent != 0 + mantissa without the highest bit set

It doesn't mention if these values can appear in the wild, nor if they are internally converted. So I actually dusted off Ollydbg and manually set bits in the x87 registers. I crafted ST(0) to contain all bits set in the exponent and a mantissa of 0. Then I made it execute

FSTP QWORD [ESP]
FLD QWORD [ESP]

The value stored at [ESP] was converted to a signaling NaN. After the FLD, ST(0) contained a quiet NaN.

I guess that answers my question. I accepted J-16 SDiZ's solution because it's the most straight forward solution (although it doesn't explicitly explain some of the finer details).

Anyway, case solved. Thanks, everybody.

Probably going to need to ask somebody that knows assembler to put it back in a regester that is associated with a particular variable. — Martin York, Aug 06 '11 at 14:12
@David Heffernan: I do but C++ (hence the tag) does not guarantee that long double is 80 bits in size. In fact, VC++ defines long double and double to be the same size (64 bits). Inline assembler seems like the only way to get a perfect conversions (there is code in the question I linked) but I prefer using plain C++, especially since there is no inline assembler in VC++ 64-bit. — pezcode, Aug 06 '11 at 14:44
The non-hidden initial `1` certainly makes it possible to write down bitpatterns that don't correspond to standard floats. I'm not sure if anyone is required to "normalize" those at some point. Surely a denormal would require the exponent to be zero. — Kerrek SB, Aug 06 '11 at 14:46
@pezcode I think I'd be looking to receive the information in a more malleable format, i.e. double or a string. — David Heffernan, Aug 06 '11 at 14:48
@David Heffernan: The buffer comes like this from a Windows CONTEXT struct, I'm afraid there is no other feasible way to get that information. — pezcode, Aug 06 '11 at 15:20
Well, you're not going that get that info from IEEE and the Wikipedia quotes sure sound like nonsense. Intel manual required. Why don't you just convert it to double and *then* find out what you got? — Hans Passant, Aug 06 '11 at 15:30

Anders Lindahl · Answer 1 · 2011-08-06T14:58:56.563

3

The problem with finding information on sub-normal 80 bit numbers might be because the 8087 does not make use of any special denormalization for them. Found this on MSDNs page on Type float (C):

The values listed in this table apply only to normalized floating-point numbers; denormalized floating-point numbers have a smaller minimum value. Note that numbers retained in 80x87 registers are always represented in 80-bit normalized form; numbers can only be represented in denormalized form when stored in 32-bit or 64-bit floating-point variables (variables of type float and type long).

Edit

The above might be true for how Microsoft make use of the FPUs registers. Found another source that indicate this:

FPU Data types:

The 80x87 FPU generally stores values in a normalized format. When a floating point number is normalized, the H.O. bit is always one. In the 32 and 64 bit floating point formats, the 80x87 does not actually store this bit, the 80x87 always assumes that it is one. Therefore, 32 and 64 bit floating point numbers are always normalized. In the extended precision 80 bit floating point format, the 80x87 does not assume that the H.O. bit of the mantissa is one, the H.O. bit of the number appears as part of the string of bits.

Normalized values provide the greatest precision for a given number of bits. However, there are a large number of non-normalized values which we can represent with the 80 bit format. These values are very close to zero and represent the set of values whose mantissa H.O. bit is not zero. The 80x87 FPUs support a special form of 80 bit known as denormalized values.

edited Aug 06 '11 at 14:58

answered Aug 06 '11 at 14:30

Anders Lindahl

41,582
9
89
93

2

I think the original motivation of the 80 bit FPU was that you can do 64-bit operations with greater accuracy, but you're always expected to interface via 64-bit floats. In that scenario, you would never create denormal 80-bit floats, because even the smallest 64-bit float is still a normal 80-bit float. – Kerrek SB Aug 06 '11 at 14:49
@Kerrek SB: That doesn't mean it's impossible to feed it two 64-bit floats, calculate a little and get a result that even 80-bits can't hold in normalized form. – pezcode Aug 06 '11 at 15:09
@pezcode: sure, but going back any 80-bit denormal value will be zero in the 64-bit representation, so that's not a problem, just underflow. – Kerrek SB Aug 06 '11 at 15:16
Though it hasn't been mentioned, I wouldn't be surprised if the design of 80-bit values was designed to allow for the possibility that an implementation might keep intermediate sums in a non-normalized format. In a non-FPU implementation computing x-y+z, when all three values are of similar magnitude, being able to skip normalization after (x-y) could greatly improve performance. – supercat Feb 22 '15 at 00:32
The [wikipedia page](https://en.wikipedia.org/wiki/Extended_precision#x86_Extended_Precision_Format) suggests there are denormal extended precision values too. My favourite language, Delphi, uses Extended as main floating point type and it has denormal values for them too. – Rudy Velthuis Jan 04 '16 at 16:00

score 3 · Accepted Answer · answered Aug 06 '11 at 16:28

3

Try SoftFloat library, it have floatx80_to_float32, floatx80_to_float64 and floatx80_to_float128. Detect the native format, act accordingly.

answered Aug 06 '11 at 16:28

J-16 SDiZ

26,473
4
65
84

80-bit floating point and subnormal numbers

2 Answers2