Rounding issues with bitwise C code

Question

I have to following bitwise code which casts a floating point value (packaged in an int) to an int value.

Question: There are rounding issues so it fails in cases where input is 0x80000001 for example. How do I handle this?

Here is the code:

  if(x == 0) return x;

  unsigned int signBit = 0;
  unsigned int absX = (unsigned int)x;
  if (x < 0)
  {
      signBit = 0x80000000u;
      absX = (unsigned int)-x;
  }

  unsigned int exponent = 158;
  while ((absX & 0x80000000) == 0)
  {
      exponent--;
      absX <<= 1;
  }

  unsigned int mantissa = absX >> 8;

  unsigned int result = signBit | (exponent << 23) | (mantissa & 0x7fffff);
  printf("\nfor x: %x, result: %x",x,result);
  return result;

Good: You marked this as homework and posted your code. Bad: You didn't ask your question! What do you need help with? — lc., Sep 10 '12 at 01:05
Do you have examples of expected input and output? for example the 32-bit float pattern 0x80000001 is a very small negative number close to zero. So if you want the `int` of that then the answer is zero? — Mark Tolonen, Sep 10 '12 at 02:14
Yes: Test [0x80000001] gives [0xceffffff] - expected [0xcf000000] — Anonymous, Sep 10 '12 at 02:17
Out of curiosity, is the person asking [this question](http://stackoverflow.com/q/12336314/900873) also in your class? — Kevin, Sep 10 '12 at 02:27

Kevin · Answer 1 · 2012-09-10T02:02:11.543

1

That's because the precision of 0x80000001 exceeds that of a float. Read the linked article, the precision of a float is 24 bits, so any pair of floats whose difference (x - y) is less than the highest bit of the two >> 24 simply cannot be detected. gdb agrees with your cast:

main.c:

#include <stdio.h>

int main() {
    float x = 0x80000001;
    printf("%f\n",x);
    return 0;
}

gdb:

Breakpoint 1, main () at test.c:4
4       float x = 0x80000001;
(gdb) n
5       printf("%f\n",x);
(gdb) p x
$1 = 2.14748365e+09
(gdb) p (int)x
$2 = -2147483648
(gdb) p/x (int)x
$3 = 0x80000000
(gdb)

The limit of this imprecision:

(gdb) p 0x80000000 == (float)0x80000080 
$21 = 1
(gdb) p 0x80000000 == (float)0x80000081
$20 = 0

The actual bitwise representation:

(gdb) p/x (int)(void*)(float)0x80000000
$27 = 0x4f000000
(gdb) p/x (int)(void*)(float)0x80000080
$28 = 0x4f000000
(gdb) p/x (int)(void*)(float)0x80000081
$29 = 0x4f000001

doubles do have enough precision to make the distinction:

(gdb) p 0x80000000 == (float)0x80000001
$1 = 1
(gdb) p 0x80000000 == (double)0x80000001
$2 = 0

edited Sep 10 '12 at 02:02

answered Sep 10 '12 at 01:16

Kevin

53,822
15
101
132

@Anonymous use `double`s. I've added some explanation. It boils down to the fact that the number of bits in the mantissa of a float simply isn't enough to store those two numbers in a distinct manner. – Kevin Sep 10 '12 at 01:49
Also, (general comment) if anyone knows an easier way to get the bitwise representation of a float, I'd love to hear it. – Kevin Sep 10 '12 at 01:53
@Anonymous That's why I said `double`, not `long`. – Kevin Sep 10 '12 at 01:57
int sign = x>>31; int exp = (0x78F00000 & x)>>23; int mant = (0x007FFFFF & x); – Anonymous Sep 10 '12 at 01:58
Cannot use double - using bitwise ops. – Anonymous Sep 10 '12 at 02:00
As I said, `float`s just don't have the bits to make that distinction, so if you're stuck with 32-bit `int`s and `float`s, you *can't* distinguish between `(float)0x80000000` and `(float)0x80000001` because they have the same bitwise representation. ***There's no way around that***. – Kevin Sep 10 '12 at 02:09
@Kevin, your C code is incorrect. The correct float value for 0x80000001 is -1.4012984643248171e-045. – Mark Tolonen Sep 10 '12 at 02:09
I am just trying to get: *(int *) &someIntWithFloatVal; – Anonymous Sep 10 '12 at 02:11
@Anonymous are you saying your code can't distinguish between the bits `0x80000000` and `0x80000001`, representing -0 and -1.40129846e-45, or the float value of `0x80000000`, decimal `-2147483648`, and `0x80000001` = decimal `-2147483647`, both with floating point representations of `0x4f000000`? – Kevin Sep 10 '12 at 02:18
@MarkTolonen My c code does precisely what I intend it to, store the value 2147483648.0 into a `float`. I've asked the OP for confirmation that this is what he intended. – Kevin Sep 10 '12 at 02:22
Ah! Now 0xCF00000 makes sense above (the correct value of -2147483648 and -2147483647). – Mark Tolonen Sep 10 '12 at 02:23
@Kevin, sorry misunderstood the question until OP provided sample input and output. – Mark Tolonen Sep 10 '12 at 02:29

Rounding issues with bitwise C code

1 Answers1

Linked