IEEE 754 Bit manipulation Rounding Error

Question

Without using casts or functionality of libraries, I must cast an integer to a float with bit manipulation. Below is the code I am currently working on. It is based off of code that I found in Cast Integer to Float using Bit Manipulation breaks on some integers in C. The problem that I have ran into involves the rounding standards in IEEE 754. More specifically my code rounds towards 0, but it should round towards even numbers. What changes do I need to make?

unsigned inttofloat(int x) {
    int bias = 127;
    int man;
    int exp = bias + 31; //8-bit exp
    int count = 0;
    int tmin = 1 << 31;
    int manpattern = 0x7FFFFF;

    int sign = 0;

    if (x == 0){
        return 0;
    }
    else if (x == tmin){
        return 0xcf << 24;
    }

    if (x < 0) {
        sign = tmin;
        x = ~x + 1; // makes x negative so that we can accurately represent it later on.
    }

    while((x & tmin) == 0){
        exp--;
        x <<= 1;
        count++;
    }

    exp <<= 23;
    man = (x >> 8) & manpattern;

    return (sign | exp | man);
}

`int tmin = 1 << 31;` undefined behavior on an implementation where `int` has less than 33 bits. — EOF, Feb 03 '17 at 19:09
How is that? In C99 it is said that shift of word size minus 1 is allowed where this is exactly that. I have 32 bits in my integer, so 31 shift left is still well defined. — Kurt Price, Feb 03 '17 at 19:11
Here is an article that explains exactly this @EOF http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html — Kurt Price, Feb 03 '17 at 19:11
C99 draft standard n1256: *6.5.7 Bitwise shift operators 4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2 E2 , reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2 E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.* — EOF, Feb 03 '17 at 19:12
By the same token, `x <<= 1;` in `while((x & tmin) == 0)` is guaranteed to exhibit undefined behavior. — EOF, Feb 03 '17 at 19:15
Yes and as I am returning it as an unsigned integer. I do not care about the value. The value is represented as the minimum 2's compliment number. I think you are missing the point. — Kurt Price, Feb 03 '17 at 19:15
I invite you to read up on the meaning of *undefined behavior* in C. Whether or not you care about the value is immaterial to the problem. — EOF, Feb 03 '17 at 19:16
Typical C question - nice question, oh *wait*, there is some UB there, now we can all avoid addressing the real issue. Bonus points if: you get a useless answer about the UB, you update the question, then you get chastised for it. More bonus if: you then make a new question without the UB, and get chastised for re-asking. — harold, Feb 03 '17 at 19:38
@harold: C is a demanding language. It requires a certain mental discipline, and if you can't follow it, attempting to write C is a waste of time for you, and asking questions about it is a waste of time for everybody else. — EOF, Feb 03 '17 at 19:58
@EOF it does not *require* anything lofty like that. Writing portable or even standard C is completely optional. — harold, Feb 03 '17 at 20:10
There is little reason to use `int` though-out code and good reasons to use `unsigned`. Suggest coding with `unsigned inttofloat(int x) { unsigned u = x; ...` and then only use `u`. — chux - Reinstate Monica, Feb 03 '17 at 20:18

chux - Reinstate Monica · Accepted Answer · 2017-02-03T20:37:58.540

3

To round toward nearest - ties to even, replace (x >> 8) with:

unsigned u = x;  // avoid any potential signed shifting issues
unsigned lease_significant_bit = (u >> 8) & 1;
unsigned round_bit = (u >> 7) & 1; // Most significant bit shifted out
unsigned sticky_bit_flag = !!(u & 0x7F);  // All other bits shifts out

// OP's shifted answer.
u = (u >> 8): 

// round away if more than half-way or
//  if at half-way and number is odd
u += (round_bit & sticky_bit_flag) | (round_bit & lease_significant_bit);

Leave it for OP to simplify

Note that u += 1 may propagate all the way through and require an exponent increase.

edited Feb 03 '17 at 20:37

answered Feb 03 '17 at 20:03

chux - Reinstate Monica

143,097
13
135
256

Thank you so much for the reply. I am a little confused though on where to implement this. you only said to replace (x>>8). Should I replace that entire line or keep the & manpattern? – Kurt Price Feb 03 '17 at 20:49
The final `u` could be used like `man = u & manpattern;`, yet _take time_ to understand the code rather than simply cut/paste. In particular, if `x&manpattern == manpattern`, then the `u` will propagate the carry due to `u += ...` and the exponent will need to increase. – chux - Reinstate Monica Feb 03 '17 at 21:19

IEEE 754 Bit manipulation Rounding Error

1 Answers1

Linked