How to get the sign, mantissa and exponent of a floating point number

Question

I have a program, which is running on two processors, one of which does not have floating point support. So, I need to perform floating point calculations using fixed point in that processor. For that purpose, I will be using a floating point emulation library.

I need to first extract the signs, mantissas and exponents of floating point numbers on the processor which do support floating point. So, my question is how can I get the sign, mantissa and exponent of a single precision floating point number.

Following the format from this figure,

enter image description here That is what I've done so far, but except sign, neither mantissa and exponent are correct. I think, I'm missing something.

void getSME( int& s, int& m, int& e, float number )
{
    unsigned int* ptr = (unsigned int*)&number;

    s = *ptr >> 31;
    e = *ptr & 0x7f800000;
    e >>= 23;
    m = *ptr & 0x007fffff;
}

Try to start from here: http://en.wikipedia.org/wiki/Single-precision_floating-point_format, but I am almost sure that you saw this — Alex, Mar 28 '13 at 15:02
Aliasing through pointer conversion is not supported by the C standard and may be troublesome in some compilers. It is preferable to use `(union { float f; uint32_t u; }) { number } .u`. This returns a `uint32_t` that is the bytes of the `float` `number` reinterpreted as a 32-bit unsigned integer. — Eric Postpischil, Mar 28 '13 at 15:29
I'm assuming IEEE 754 32 bit binary. Are you aware of the following issues? (1) The exponent is biassed, by adding 127 to the actual exponent. (2) All except very small floats are normalized, and the leading 1 bit of a normalized float mantissa is not stored. — Patricia Shanahan, Mar 28 '13 at 17:05
Three problems: 0. not removing the bias from the encoded exponent 1. not adding the implicit mantissa bit for normal nonzero numbers 2. not handling denormals, infinities and sNaN/qNaNs's — , Sep 22 '18 at 22:28
This code handles IEEE-754 denormals and infinities and NaNs if the caller knows what they are doing. Denormal has zero exponent, nonzero mantissa. Infinity has maximum exponent, zero mantissa. NaN has maximum exponent, nonzero mantissa. The hidden mantissa bit should be set if the exponent is not zero and the exponent is not maximum. — doug65536, Apr 19 '22 at 13:25

score 37 · Accepted Answer · edited Jun 28 '17 at 02:08

37

I think it is better to use unions to do the casts, it is clearer.

#include <stdio.h>

typedef union {
  float f;
  struct {
    unsigned int mantisa : 23;
    unsigned int exponent : 8;
    unsigned int sign : 1;
  } parts;
} float_cast;

int main(void) {
  float_cast d1 = { .f = 0.15625 };
  printf("sign = %x\n", d1.parts.sign);
  printf("exponent = %x\n", d1.parts.exponent);
  printf("mantisa = %x\n", d1.parts.mantisa);
}

Example based on http://en.wikipedia.org/wiki/Single_precision

edited Jun 28 '17 at 02:08

Stargateur

24,473
8
65
91

answered Mar 28 '13 at 15:06

eran

6,731
6
35
52

11

"For some reason, this original purpose of the union got "overriden" with something completely different: writing one member of a union and then inspecting it through another member. This kind of memory reinterpretation is not a valid use of unions. It generally leads to undefined behavior." http://stackoverflow.com/a/2313676/1127387 – datwelk Nov 02 '13 at 13:50
11

There's no law that says you have to only use things for what they were originally created for. Otherwise the first plane wouldn't have used bits of bicycle. "Generally" undefined? What about those occasions when it is defined, or when you're happy with the behaviour on a given platform/situation? – Feb 28 '14 at 11:29
I'm not getting the right results. The solution of Xymostech works. – Patricia Nov 27 '15 at 22:18
@Alex There actually is a law that says you can only do some things and not others in C. It's called the "ISO/IEC 9899:2011 specification", colloquially known as the "C language standard". And if that tells you that what you're doing is undefined that means you can get anything back. One day it could work, the next it could give you the wrong result and still the next day it could just crash. That's what "undefined" means. – Voo Feb 24 '16 at 19:08
15

This method fails when 1) `float` is not IEEE 754 32 bit binary (not so rare) 2) `unsigned` is 16-bit (common in embedded world) 3) endian of `unsigned/float` do not match. (rare). 4) Mathematical interpretation is used for `exponent/mantissa` as this answer shows the biased exponent and the incomplete significand/mantissa. – chux - Reinstate Monica Mar 05 '16 at 17:34
4

Is the above code portable? What happens on big and little endian machines? – Joe C Mar 07 '16 at 23:39
11

Very late to the party here, but no, the `union` is not better because it is not guaranteed to work at all. It certainly is not portable. Nothing constrains the C implementation to lay out the bitfields such that the union maps them to the desired pieces of the `float` representation, the separate question of relying on type punning at all notwithstanding. – John Bollinger Oct 25 '16 at 21:27
2

however `uint32_t` may be better at specifying a bit field no more than 4 bytes. – bumfo Dec 16 '16 at 12:38
1

Don't do this. It's slower, not portable and depends on the whims of the compiler. – Sep 22 '18 at 21:25
@Voo: it won't change from day to day. Rather from platform to platform or from version to next version. So there is no guarantee it works, but there is nothing that says it won't work either. In many situations, it *will* simply work. If you don't change compiler and/or platform, it will keep on working, even if the standard says it is undefined behaviour. – Rudy Velthuis Sep 24 '18 at 09:39
@Rudy "If you don't change compiler and/or platform, it will keep on working, even if the standard says it is undefined behaviour". I can think of many examples where that claim is simply wrong. Just the most trivial example: Do you think that reading unintialized memory always gives back the same value if you run a program multiple times? You also seem to think that compilers work deterministically when compiling a complex program - nothing could be further from the truth. – Voo Sep 25 '18 at 12:39
Now I guess you might argue that in this case, it's unlikely to be a problem and you're probably right. But that still leaves you with "we have to make sure absolutely everyone uses exactly the same compiler and platform when compiling our code oh and even a minor update to either of those things can wreak havoc with our code". Is that honestly a position you're comfortable in? – Voo Sep 25 '18 at 12:50
@Voo: I can't think of many examples where that is simply wrong. And yes, very often it does give back the same value. Just try it. And compilers work deterministically when compiling. Anything else would be pure chaos. – Rudy Velthuis Sep 25 '18 at 13:05
@Voo: of course not. What I am saying is that undefined behaviour doesn't mean you can't have deterministic behaviour on a single platform, or that programs always behave erratically on certain cases of undefined behaviour. Certain constructs are called causing undefined behaviour because the standard does not define what happens or should happen. That does not mean that you will always get different behaviour on the same platform. Actually, you hardly ever get such different behaviour. But it may differ from compiler to compiler or from platform to platform. – Rudy Velthuis Sep 25 '18 at 13:09
@Voo "we have to make sure absolutely everyone uses exactly the same compiler and platform when compiling our code". I never said that, did I? Of course one should avoid undefined behaviour for code that must be portable. But sometimes, programs are not meant to be portable (if a program uses certain OS API functions, it generally isn't anyway). Then, often, certain undefined actions can still make sense, even if the standard calls it undefined behaviour. For instance, (ab)using unions to overlay certain different data types to get at their internals, etc. But that means you must be careful. – Rudy Velthuis Sep 25 '18 at 13:15
@Voo: equally, doing a cast like:`unsigned long x = *(unsigned long*)&mydouble;` might work on one platform, and not on another (e.g. where long is only 32 bit). So that is *undefined behaviour*. But if you **know** the code will run on a platform with 64 bit longs and where double is IEEE float64, it will work anyway, and reliably (**no different outcome each time**). – Rudy Velthuis Sep 25 '18 at 13:20
@Rudy "I never said that". You did say "If you don't change compiler and/or platform, it will keep on working". Which says that if you change one of those things it might not work any more. So you have to rely a) that every compiler you use or might use in the future handles UB in a deterministic way and b) that you exhaustively test every code that causes UB, including the callers. – Voo Sep 25 '18 at 13:21
@Rudy So if I know that my code uses 32-bit 2s complement ints, then I can always test if an addition overflows by checking `if (a + 1 < a) // overflow` right? Hey I do know that that's the case! And I just tested it on an old compiler and it worked! So clearly this is reliable and will never change by your argument, right? – Voo Sep 25 '18 at 13:23
@Voo: I said "If you don't change compiler and/or platform, it will keep on working", indeed. That does not imply what you said. And it is true, and contradicts what you said: "One day it could work, the next it could give you the wrong result and still the next day it could just crash". No, usually it won't. The same code on the same platform usually gives the same outcome. – Rudy Velthuis Sep 25 '18 at 13:28
@Rudy If your level for production code is "usually it won't [cause problems]" then good for you. Also you didn't answer my last question - do you agree that the `if (a + 1 < a) // overflow` will continue to work for me just fine as long as I stick to 32-bit 2s complement and don't change my platform? – Voo Sep 25 '18 at 13:31
@Voo: no, that is not what I said either. If one day, C might get overflow protection (very unlikely, but hey, stranger things have happened), it might become false. But as long as C does not, yes, it is reliable and depending on such an overflow or such a test has been used many many times in all kinds of C code: code doing big integer math, code doing CRC calculations or encryption/decryption/encoding/decoding, etc., i.e. even a lot of more or less "portable" code. – Rudy Velthuis Sep 25 '18 at 13:33
@Rudy "But as long as C does not, yes, it is reliable". And **that** is exactly why people shouldn't rely on some specific instances of UB because hey I know my platform and I know exactly how compilers work. Go ahead and compile `public int overflows(int a) { return a + 1 < a; }` with a new gcc with optimisations and check the binary. Fun fact, it will be equivalent to `public int overflows(int a){ return false; }`. And yes that has caused real world bugs, because people thought they were particularly clever. – Voo Sep 25 '18 at 13:36
@Voo: you are arguing from the wrong premise. If my code is meant to be portable, I write portable code. If it is meant to be used on only one platform (say, an OS tool handling some specific data found in certain versions of that OS), then I can be sure that certain "tricks"/"hacks" work reliably, and if necessary, I will use them. I will even use assembler, when necessary. I never code on the premise "usually it won't cause problems". **I am merely saying that your statement, i.e. that the outcome differs, was wrong**. – Rudy Velthuis Sep 25 '18 at 13:37
@Voo: then I wonder why all the code that does use overflow actually works. Probably because they don't check, they simply **know** there will be overflow and when. And e.g. biginteger adding code does use such checks (`if (a + b < a)`, where a and b unsigned) to generate a carry. If gcc always returns false, in some circumstances, then gcc is buggy. Bugs can cause real world problems, indeed. – Rudy Velthuis Sep 25 '18 at 13:53
@Rudy "And e.g. biginteger adding code does use such checks (if (a + b < a), where a and b unsigned) to generate a carry". Which just goes to show that the developers of that code were clever enough to avoid UB when they wrote that code for unsigned ints. And you can go ahead and try the above snippet, or go read the bug reports that came in from people whose code broke when gcc did this perfectly acceptable optimisation and who complained saying basically exactly what you did: "Sure it's UB but it always worked in the past on this platform!". – Voo Sep 25 '18 at 13:58
@Voo: I think I said that things can change if you change the compiler. That is not what I was contesting. I was contesting your statement that you will get different outcomes on a daily basis, which is not nearly the same. And I don'tz understand "Which just goes to show that the developers of that code were clever enough to understand what causes UB and what not when they wrote that code for unsigned ints". They rely on overflow too: if the result is below one or each of the operands, there must have been overflow. So why does gcc not always return "false" there? – Rudy Velthuis Sep 25 '18 at 14:04
@Rudy So do I now really have to give you some code that will produce different results when run multiple times? (btw the difference between UB and the perfectly fine implementation defined code is in the used types). Hell I don't even have to, just go and look for all those stochastic exploits that only work X% of the time. Lots of those rely on UB. At what professional companies or open source programs have you worked where "UB is fine" was part of the programming guide? This conversation seems ridiculous. – Voo Sep 25 '18 at 14:13
I didn't say that such code exists. I can even write non-UB code that produces different results each time. No big deal. But I contested your general statement. Most of the time, even undefined behaviour does not mean undetermined behaviour. It merely means it is *not defined* by the standard. – Rudy Velthuis Sep 25 '18 at 14:30
@Rudy So you contested a **general statement** that something **might** happen, with "but it might not!". Well ok, yeah sure. – Voo Sep 25 '18 at 14:37
I didn't say that such code **doesn't** exist, I meant. – Rudy Velthuis Sep 25 '18 at 14:37
I contested your statement "One day it could work, the next it could give you the wrong result and still the next day it could just crash" with "no, generally and often, you will have the same outcome all the time". – Rudy Velthuis Sep 25 '18 at 14:39
`memcpy` from it, into a local variable, if the CPU can do it, the compiler will make it just reinterpret it in place, and optimize away the local, and if the CPU can't (alignment constraint), the compiler is probably prepared to do it reasonably efficiently. `memcpy` is a builtin since ages ago in all compilers, the compiler will see what you mean. – doug65536 Apr 19 '22 at 13:07
Maybe I am overlooking it, but, where’s the expected answer from the example provided in your answer? What should I expect the output to be? – James Bush Jul 10 '22 at 02:37

score 35 · Answer 2 · answered Oct 26 '13 at 16:12

My advice is to stick to rule 0 and not redo what standard libraries already do, if this is enough. Look at math.h (cmath in standard C++) and functions frexp, frexpf, frexpl, that break a floating point value (double, float, or long double) in its significand and exponent part. To extract the sign from the significand you can use signbit, also in math.h / cmath, or copysign (only C++11). Some alternatives, with slighter different semantics, are modf and ilogb/scalbn, available in C++11; http://en.cppreference.com/w/cpp/numeric/math/logb compares them, but I didn't find in the documentation how all these functions behave with +/-inf and NaNs. Finally, if you really want to use bitmasks (e.g., you desperately need to know the exact bits, and your program may have different NaNs with different representations, and you don't trust the above functions), at least make everything platform-independent by using the macros in float.h/cfloat.

Is there any potable way to know the existence of the hidden bit in the significand? And what about the detection of the exponent bias? — FrankHB, Nov 27 '21 at 21:02

score 26 · Answer 3 · edited May 23 '17 at 12:18

26

Find out the format of the floating point numbers used on the CPU that directly supports floating point and break it down into those parts. The most common format is IEEE-754.

Alternatively, you could obtain those parts using a few special functions (double frexp(double value, int *exp); and double ldexp(double x, int exp);) as shown in this answer.

Another option is to use %a with printf().

edited May 23 '17 at 12:18

Community

1
1

answered Mar 28 '13 at 15:05

Alexey Frunze

61,140
12
83
180

Too bad I'm searching for a portable solution to *implement* `dtoa`, which is somewhat a subset of `printf`... – FrankHB Nov 27 '21 at 20:59

Xymostech · Answer 4 · 2013-03-28T15:14:25.067

12

You're &ing the wrong bits. I think you want:

s = *ptr >> 31;
e = *ptr & 0x7f800000;
e >>= 23;
m = *ptr & 0x007fffff;

Remember, when you &, you are zeroing out bits that you don't set. So in this case, you want to zero out the sign bit when you get the exponent, and you want to zero out the sign bit and the exponent when you get the mantissa.

Note that the masks come directly from your picture. So, the exponent mask will look like:

0 11111111 00000000000000000000000

and the mantissa mask will look like:

0 00000000 11111111111111111111111

edited Mar 28 '13 at 15:14

answered Mar 28 '13 at 15:07

Xymostech

9,710
3
34
44

@MetallicPriest Try now, I had the wrong masks the first time. – Xymostech Mar 28 '13 at 15:12
5

What about the so called hidden bit? I don't see anyone set it: `m |= 0x00800000;`. Note that the number should be checked for special values (denormals, NaN, infinities) first, since these require different treatment. – Rudy Velthuis Mar 29 '13 at 22:16
@RudyVelthuis From their original code, it doesn't look they were trying to actually obtain the values of the exponent and mantissa, just trying to get the bit representation of each. I'm assuming this because they didn't or in the hidden bit or normalize the sign, but I could be wrong. – Xymostech Mar 30 '13 at 01:50
I'm assuming they forgot and that is why they got wrong values. But I can only guess. – Rudy Velthuis Mar 30 '13 at 09:13

score 12 · Answer 5 · answered Sep 17 '15 at 22:01

On Linux package glibc-headers provides header #include <ieee754.h> with floating point types definitions, e.g.:

union ieee754_double
  {
    double d;

    /* This is the IEEE 754 double-precision format.  */
    struct
      {
#if __BYTE_ORDER == __BIG_ENDIAN
    unsigned int negative:1;
    unsigned int exponent:11;
    /* Together these comprise the mantissa.  */
    unsigned int mantissa0:20;
    unsigned int mantissa1:32;
#endif              /* Big endian.  */
#if __BYTE_ORDER == __LITTLE_ENDIAN
# if    __FLOAT_WORD_ORDER == __BIG_ENDIAN
    unsigned int mantissa0:20;
    unsigned int exponent:11;
    unsigned int negative:1;
    unsigned int mantissa1:32;
# else
    /* Together these comprise the mantissa.  */
    unsigned int mantissa1:32;
    unsigned int mantissa0:20;
    unsigned int exponent:11;
    unsigned int negative:1;
# endif
#endif              /* Little endian.  */
      } ieee;

    /* This format makes it easier to see if a NaN is a signalling NaN.  */
    struct
      {
#if __BYTE_ORDER == __BIG_ENDIAN
    unsigned int negative:1;
    unsigned int exponent:11;
    unsigned int quiet_nan:1;
    /* Together these comprise the mantissa.  */
    unsigned int mantissa0:19;
    unsigned int mantissa1:32;
#else
# if    __FLOAT_WORD_ORDER == __BIG_ENDIAN
    unsigned int mantissa0:19;
    unsigned int quiet_nan:1;
    unsigned int exponent:11;
    unsigned int negative:1;
    unsigned int mantissa1:32;
# else
    /* Together these comprise the mantissa.  */
    unsigned int mantissa1:32;
    unsigned int mantissa0:19;
    unsigned int quiet_nan:1;
    unsigned int exponent:11;
    unsigned int negative:1;
# endif
#endif
      } ieee_nan;
  };

#define IEEE754_DOUBLE_BIAS 0x3ff /* Added to exponent.  */

How do we use these in practice ? If we have to check whether we got a nan, how do we do ? — Dimitri Lesnoff, Jun 16 '22 at 16:40
@DimitriLesnoff Start with [`std::isnan`](https://en.cppreference.com/w/cpp/numeric/math/isnan). — Maxim Egorushkin, Jun 19 '22 at 22:46
This looks like a great (thorough) start, but would it be too much to ask for a little test code that shows the input and the output, and how the two work (or don’t)? — James Bush, Jul 10 '22 at 03:36

score 2 · Answer 6 · answered Sep 22 '18 at 22:15

Don't make functions that do multiple things.
Don't mask then shift; shift then mask.
Don't mutate values unnecessarily because it's slow, cache-destroying and error-prone.
Don't use magic numbers.

/* NaNs, infinities, denormals unhandled */
/* assumes sizeof(float) == 4 and uses ieee754 binary32 format */
/* assumes two's-complement machine */
/* C99 */
#include <stdint.h>

#define SIGN(f) (((f) <= -0.0) ? 1 : 0)

#define AS_U32(f) (*(const uint32_t*)&(f))
#define FLOAT_EXPONENT_WIDTH 8
#define FLOAT_MANTISSA_WIDTH 23
#define FLOAT_BIAS ((1<<(FLOAT_EXPONENT_WIDTH-1))-1) /* 2^(e-1)-1 */
#define MASK(width)  ((1<<(width))-1) /* 2^w - 1 */
#define FLOAT_IMPLICIT_MANTISSA_BIT (1<<FLOAT_MANTISSA_WIDTH)

/* correct exponent with bias removed */
int float_exponent(float f) {
  return (int)((AS_U32(f) >> FLOAT_MANTISSA_WIDTH) & MASK(FLOAT_EXPONENT_WIDTH)) - FLOAT_BIAS;
}

/* of non-zero, normal floats only */
int float_mantissa(float f) {
  return (int)(AS_U32(f) & MASK(FLOAT_MANTISSA_BITS)) | FLOAT_IMPLICIT_MANTISSA_BIT;
}

/* Hacker's Delight book is your friend. */

"Don't use magic numbers" strongly suggests using `DBL_MANT_DIG` rather than creating one's own version! — Toby Speight, Mar 07 '21 at 09:12

AymenTM · Answer 7 · 2019-11-27T13:37:10.980

See this IEEE_754_types.h header for the union types to extract: float, double and long double, (endianness handled). Here is an extract:

/*
** - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
**  Single Precision (float)  --  Standard IEEE 754 Floating-point Specification
*/

# define IEEE_754_FLOAT_MANTISSA_BITS (23)
# define IEEE_754_FLOAT_EXPONENT_BITS (8)
# define IEEE_754_FLOAT_SIGN_BITS     (1)

.
.
.

# if (IS_BIG_ENDIAN == 1)
    typedef union {
        float value;
        struct {
            __int8_t   sign     : IEEE_754_FLOAT_SIGN_BITS;
            __int8_t   exponent : IEEE_754_FLOAT_EXPONENT_BITS;
            __uint32_t mantissa : IEEE_754_FLOAT_MANTISSA_BITS;
        };
    } IEEE_754_float;
# else
    typedef union {
        float value;
        struct {
            __uint32_t mantissa : IEEE_754_FLOAT_MANTISSA_BITS;
            __int8_t   exponent : IEEE_754_FLOAT_EXPONENT_BITS;
            __int8_t   sign     : IEEE_754_FLOAT_SIGN_BITS;
        };
    } IEEE_754_float;
# endif

And see dtoa_base.c for a demonstration of how to convert a double value to string form.

Furthermore, check out section 1.2.1.1.4.2 - Floating-Point Type Memory Layout of the C/CPP Reference Book, it explains super well and in simple terms the memory representation/layout of all the floating-point types and how to decode them (w/ illustrations) following the actually IEEE 754 Floating-Point specification.

It also has links to really really good ressources that explain even deeper.

How to get the sign, mantissa and exponent of a floating point number

7 Answers7

Linked

Related