Separate a double into it's sign, exponent and mantissa

Question

I've read a few topics that do already broken down doubles and "puts it together" but I am trying to break it into it's base components. So far I have the bit nailed down:

breakDouble( double d ){

    long L = *(long*) &d;

    sign;
    long mask = 0x8000000000000000L;

    if( (L & mask) == mask ){

        sign = 1;

    } else {

        fps.sign = 0;
    }
    ...
}

But I'm pretty stumped as to how to get the exponent and the mantissa. I got away with forcing the double into a long because only the leading bit mattered so truncation didn't play a role. However, with the other parts I don't think that will work and I know you can't do bitwise operators on floats so I'm stuck.

Thoughts?

edit: of course as soon as I post this I find this, but I'm not sure how different floats and doubles are in this case.

Edit 2(sorry working as I go): I read that post I linked in edit 1 and it seems to me that I can perform the operations they are doing on my double the same way, with masks for the exponent being:

mask = 0x7FF0000000000000L;

and for the mantissa:

mask = 0xFFFFFFFFFFFFFL;

Is this correct?

You have the bits in L, so you can do the bit ops on that. To get what you want is a matter of masking and shifting. I forget what the offsets and shifts are though. — Charlie Burns, Sep 12 '13 at 02:09
Yes but didn't I "break" the double's accuracy by forcing it into a long? — Joshua, Sep 12 '13 at 02:10
No, you did the casting right. You cast a double * to a long * and then took the value. So no bits changed, you never changed the double to a long, you just copied the bits. — Charlie Burns, Sep 12 '13 at 02:11
@Joshua No, you didn't cast the double to a long, you cast it's address to (pointer to long) and then dereferencing that, which will give you exactly the same bits as you expect. The only things you need to watch out for are that `long` and `double` aren't necessarily the same size, but they probably are on your machine, and endianness could be an issue. — Paul, Sep 12 '13 at 02:12
I suppose at sometime in your mathematics lessons you heard of logarithms - http://en.wikipedia.org/wiki/Logarithm — Ed Heal, Sep 12 '13 at 02:13
On typical 64-bit systems, a `double` and a `long` have the same number of bits, namely 64: the tradeoff is that `double` has a greater total range, while `long` has a greater range of exact integer values. Since `*(long*) &d` just reinterprets the `double`'s bits as if they were those of a `long`, no accuracy is sacrificed (on such a system). (Obviously this is not portable.) — ruakh, Sep 12 '13 at 02:13
This is illegal C because it violates the strict aliasing rules. Beware modern compilers and their seemingly limitless ability to creatively misinterpret code. — tmyklebu, Sep 12 '13 at 02:13
Your link for float is good, double is the same way with different offsets and shifts. You can find it easily enough. — Charlie Burns, Sep 12 '13 at 02:13
@RaymondChen: `frexp` does something slightly different. It behaves reasonably when presented with NaNs, infinities, and subnormals and it's rather slower than the bit-hacky way because of it. — tmyklebu, Sep 12 '13 at 02:14
http://en.wikipedia.org/wiki/Double-precision_floating-point_format — Charlie Burns, Sep 12 '13 at 02:20
I have the mantissa working, but the exponent is always 0. I am doing exponent = (L & 0x7FF0000000000000L), which should be 01111111111100...0. That should give me the set bits in the exponent right? — Joshua, Sep 12 '13 at 02:41

tmyklebu · Answer 1 · 2013-09-12T14:04:42.500

4

The bit masks you posted in your second edit look right. However, you should be aware that:

Dereferencing (long *)&mydouble as you do is a violation of C's aliasing rules. This still flies under most compilers if you pass a flag like gcc's -fno-strict-aliasing, but it can lead to problems if you don't. You can cast to char * and look at the bits that way. It's more annoying and you have to worry about endianness, but you don't run the risk of compilers screwing everything up. You can also create a union type like the one at the bottom of the post and write into the d member while reading from the other three.
Minor portability note: long isn't the same size everywhere; maybe try using a uint64_t instead? (double isn't either, but it's fairly clear that this is intended to apply only to IEEE doubles.)
The trickery with bit-masks only works for so-called "normal" floating-point numbers --- those with a biased exponent that is neither zero (indicating subnormal) or 2047 (indicating infinity or NaN).
As Raymond Chen points out, the frexp function does what you actually probably want. frexp handles the subnormal, infinity, and NaN cases in a documented and sane way, but you pay a speed hit for using it.

(Apparently there needs to be some non-list text between a list and a code block. Here it is; eat it up, markdown!)

union doublebits {
  double d;
  struct {
    unsigned long long mant : 52;
    unsigned int expo : 11;
    unsigned int sign : 1;
  };
};

edited Sep 12 '13 at 14:04

answered Sep 12 '13 at 02:21

tmyklebu

13,915
3
28
57

I was planning on using the isnan and isinf functions included in math.h to handle the fringe cases. I don't know what to do about -0 though. As for #2, once I get it running I'll try making it more portable, thanks for the tip. I knew that, but it's easy to forget sometimes. – Joshua Sep 12 '13 at 02:25
@Joshua: Then just use the `frexp` function in `math.h`. This sort of thing is good for a speed hack when you know you only have normal numbers. – tmyklebu Sep 12 '13 at 02:26
That's probably what I will end up doing, but since I made the post I'll figure it out and post the answer for future inquirers :) – Joshua Sep 12 '13 at 02:27
2

The “union hack” is not a hack; the C standard says that accessing a union member other than the last one stored reinterprets the bytes in the new type. – Eric Postpischil Sep 12 '13 at 02:46
As @EricPostpischil said this is standard. C99 also aknowledges the term "type punning" in a footnote. See C99 draft standard N1256 - section **6.5.2.3 Structure and union members**, paragraph 3 and footnote #82. – LorenzoDonati4Ukraine-OnStrike Sep 12 '13 at 03:32
@EricPostpischil: Casting a pointer to `double` to a pointer to `union { double; long long; }` and then dereferencing the `union` as a `long long` is forbidden, is it not? – tmyklebu Sep 12 '13 at 13:47
@tmyklebu: The union technique is not to cast to a pointer to a union. It is to store in one union member and read from another. E.g., given `double x`, you can access its encoding with `uint64_t y = (union { double d; uint64_t u; }) {x} .u;`. – Eric Postpischil Sep 12 '13 at 13:51
@EricPostpischil: I'll clarify, then. I've only used the illegal one before, and only when doing network programming. (One fewer copy.) – tmyklebu Sep 12 '13 at 13:58
1

Note that using a `uint64_t` in a union works in more C implementations than using a struct of bit-fields. Bit-fields are more prone to endian/order issues, whereas a `uint64_t` usually has the expected relationship with a `double`: The high bit is the sign bit, the next eleven are the exponent field, and the low bits are the significand field. When using bit-fields, one ought to ensure that they are arranged as desired, possibly with using preprocessor macros to test for compiler (e.g., \_\_GNUC\_\_) and endianness (e.g., \_\_LITTLE_ENDIAN\_\_). – Eric Postpischil Sep 12 '13 at 14:28

Separate a double into it's sign, exponent and mantissa

1 Answers1

Linked