How to manually (bitwise) perform (float)x?

Question

Now, here is the function header of the function I'm supposed to implement:

/*
 * float_from_int - Return bit-level equivalent of expression (float) x
 *   Result is returned as unsigned int, but
 *   it is to be interpreted as the bit-level representation of a
 *   single-precision floating point values.
 *   Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
 *   Max ops: 30
 *   Rating: 4
 */
unsigned float_from_int(int x) {
...
}

We aren't allowed to do float operations, or any kind of casting.

Now I tried to implement the first algorithm given at this site: http://locklessinc.com/articles/i2f/

Here's my code:

unsigned float_from_int(int x) {

// grab sign bit

  int xIsNegative = 0;
  int absValOfX = x;

  if(x < 0){
    xIsNegative = 1;
    absValOfX = -x;
  }




  // zero case
  if(x == 0){
    return 0;
  }
  if(x == 0x80000000){ //Updated to add this
    return 0xcf000000;
  }
  //int shiftsNeeded = 0;

  /*while(){

    shiftsNeeded++;
    }*/


  unsigned I2F_MAX_BITS = 15;
  unsigned I2F_MAX_INPUT = ((1 << I2F_MAX_BITS) - 1);
  unsigned I2F_SHIFT = (24 - I2F_MAX_BITS);

  unsigned result, i, exponent, fraction;

  if ((absValOfX & I2F_MAX_INPUT) == 0)
    result = 0;
  else {
    exponent = 126 + I2F_MAX_BITS;
    fraction = (absValOfX & I2F_MAX_INPUT) << I2F_SHIFT;

    i = 0;
    while(i < I2F_MAX_BITS) {
      if (fraction & 0x800000)
        break;
      else {
        fraction = fraction << 1;
        exponent = exponent - 1;
      }
      i++;
    }
    result = (xIsNegative << 31) | exponent << 23 | (fraction & 0x7fffff);
  }
  return result;
}

But it didn't work (see test error below):

ERROR: Test float_from_int(8388608[0x800000]) failed...
...Gives 0[0x0]. Should be 1258291200[0x4b000000]

I don't know where to go from here. How should I go about parsing the float from this int?

EDIT #1: You might be able to see from my code that I also started working on this algorithm (see this site):

I assumed 10-bit, 2’s complement, integers since the mantissa is only 9 bits, but the process generalizes to more bits.

Save the sign bit of the input and take the absolute value of the input.
Shift the input left until the high order bit is set and count the number of shifts required. This forms the floating mantissa.
Form the floating exponent by subtracting the number of shifts from step 2 from the constant 137 or (0h89-(#of shifts)).
Assemble the float from the sign, mantissa, and exponent.

But, that doesn't seem right. How could I convert 0x80000000? Doesn't make sense.

EDIT #2: I think it's because I say max bits is 15... hmmm...

EDIT #3: Screw that old algorithm, I'm starting over:

unsigned float_from_int(int x) {

  // grab sign bit

  int xIsNegative = 0;
  int absValOfX = x;

  if(x < 0){
    xIsNegative = 1;
    absValOfX = -x;
  }


  // zero case
  if(x == 0){
    return 0;
  }
  if (x == 0x80000000){
    return 0xcf000000;
  }

  int shiftsNeeded = 0;

  int counter = 0;
  while(((absValOfX >> counter) & 1) != 1 && shiftsNeeded < 32){

    counter++;
    shiftsNeeded++;
  }

  unsigned exponent = shiftsNeeded + 127;

  unsigned result = (xIsNegative << 31) | (exponent << 23);

  return result;

Here's the error I get on this one (I think I got past the last error):

ERROR: Test float_from_int(-2139095040[0x80800000]) failed...
...Gives -889192448[0xcb000000]. Should be -822149120[0xceff0000]

May be helpful to know that: absValOfX = 7f800000 (using printf)

EDIT #4: Ah, I'm finding the exponent wrong, need to count from the left, then subtract from 32 I believe.

EDIT #5: I started over, now trying to deal with weird rounding problems...

  if (x == 0){
    return 0; // 0 is a special case because it has no 1 bits
  }
  if (x >= 0x80000000 && x <= 0x80000040){
    return 0xcf000000;
  }
  // Save the sign bit of the input and take the absolute value of the input.
  unsigned signBit = 0;
  unsigned absX = (unsigned)x;
  if (x < 0)
    {
      signBit = 0x80000000u;
      absX = (unsigned)-x;
    }

  // Shift the input left until the high order bit is set to form the mantissa.
  // Form the floating exponent by subtracting the number of shifts from 158.
  unsigned exponent = 158;
  while ((absX & 0x80000000) == 0)
    {
      exponent--;
      absX <<= 1;
    }

  unsigned negativeRoundUp = (absX >> 7) & 1 & (absX >> 8);

  // compute mantissa
  unsigned mantissa = (absX >> 8) + ((negativeRoundUp) || (!signBit & (absX >> 7) & (exponent < 156)));
  printf("absX = %x, absX >> 8 = %x, exponent = %i,  mantissa = %x\n", absX, (absX >> 8), exponent, mantissa);
  // Assemble the float from the sign, mantissa, and exponent.
  return signBit | ((exponent << 23) + (signBit & negativeRoundUp)) | ( (mantissa) & 0x7fffff);

-

absX = fe000084, absX >> 8 = fe0000, exponent = 156,  mantissa = fe0000
ERROR: Test float_from_int(1065353249[0x3f800021]) failed...
...Gives 1316880384[0x4e7e0000]. Should be 1316880385[0x4e7e0001]

EDIT #6

Did it again, still, the rounding doesn't work properly. I've tried to hack together some rounding, but it just won't work...

unsigned float_from_int(int x) {






  /*
  If N is negative, negate it in two's complement. Set the high bit (2^31) of the result.
    If N < 2^23, left shift it (multiply by 2) until it is greater or equal to.
    If N ≥ 2^24, right shift it (unsigned divide by 2) until it is less.
    Bitwise AND with ~2^23 (one's complement).
    If it was less, subtract the number of left shifts from 150 (127+23).
  If it was more, add the number of right shifts to 150.
    This new number is the exponent. Left shift it by 23 and add it to the number from step 3.
  */

  printf("---------------\n");
  //printf("x = %i (%x), -x = %i, (%x)\n", x, x, -x, -x);
  if(x == 0){
    return 0;
  }

  if(x == 0x80000000){
    return 0xcf000000;
  }

  // If N is negative, negate it in two's complement. Set the high bit of the result
  unsigned signBit = 0;

  if (x < 0){
    signBit = 0x80000000;
    x = -x;
  }

  printf("abs val of x = %i (%x)\n", x, x);

  int roundTowardsZero = 0;
  int lastDigitLeaving = 0;
  int shiftAmount = 0;
  int originalAbsX = x;

  // If N < 2^23, left shift it (multiply it by 2) until it is great or equal to.
  if(x < (8388608)){
    while(x < (8388608)){
      //printf(" minus shift and x = %i", x );
      x = x << 1;
      shiftAmount--;
    }
  } // If N >= 2^24, right shfit it (unsigned divide by 2) until it is less.
 else if(x >= (16777215)){
    while(x >= (16777215)){

      /*if(x & 1){
        roundTowardsZero = 1;
        printf("zzz Got here ---");
        }*/

      lastDigitLeaving = (x >> 1) & 1;
      //printf(" plus shift and x = %i", x);
      x = x >> 1;
      shiftAmount++;

    }
    //Round towards zero
    x = (x + (lastDigitLeaving && (!(originalAbsX > 16777216) || signBit)));




    printf("x = %i\n", x);
    //shiftAmount = shiftAmount + roundTowardsZero;
  }

  printf("roundTowardsZero = %i, shiftAmount = %i (%x)\n", roundTowardsZero, shiftAmount, shiftAmount);

  // Bitwise AND with 0x7fffff
 x = x & 0x7fffff;

  unsigned exponent = 150 + shiftAmount;

  unsigned rightPlaceExponent = exponent << 23;

  printf("exponent = %i, rightPlaceExponent = %x\n", exponent, rightPlaceExponent);

  unsigned result = signBit | rightPlaceExponent | x;

  return result;

I'm not sure, but I don't think we are allowed to cast an int to an unsigned. Also, it wouldn't matter right? Because a positive int behaves a lot like an unsigned? (I don't know) — , Sep 09 '12 at 03:43
I'm not sure what to say, but sometimes when you do bit shifts and things on signed integers strange things can happen. Another issue is that your result value will never be negative because it's unsigned. — GWW, Sep 09 '12 at 03:48
0x80000000 is a special case of the signed integers. It is the only one that is its own two's complement. It is the most negative number and doesn't have a matching positive number. — UncleO, Sep 09 '12 at 03:54
But, if you cast 0x80000000 to an unsigned, it comes out as 0x80000000. Therefore, as a nice counter-special-case, doing `int x; if(x<0) x = -x; unsigned v = (unsigned)x;` always produces the correct absolute value in `v`. — nneonneo, Sep 09 '12 at 05:00
Incidentally, I would consider signed-unsigned conversion to be a legal 'signed/unsigned' operation, but of course I suppose the final decision is up to whoever made the problem. If you can't do the conversion then you'll have to settle for special-casing 0x80000000. — nneonneo, Sep 09 '12 at 05:02
See this post: http://stackoverflow.com/questions/12342926/casting-float-to-int-bitwise-in-c/12343933#12343933 — Anonymous, Sep 10 '12 at 00:45
duplicates: [Casting float to int (bitwise) in C](https://stackoverflow.com/q/12342926/995714), [Converting Int to Float or Float to Int using Bitwise operations (software floating point)](https://stackoverflow.com/q/20302904/995714), [How to convert an unsigned int to a float?](https://stackoverflow.com/q/19529356/995714) — phuclv, May 15 '19 at 04:14

Gabe · Accepted Answer · 2012-09-09T07:11:46.267

3

The problem is that the lowest int is -2147483648, but the highest is 2147483647, so there is no absolute value of -2147483648. While you could work around it, I would just make a special case for that one bit pattern (like you do for 0):

if (x == 0)
    return 0;
if (x == -2147483648)
    return 0xcf000000;

The other problem is that you copied an algorithm that only works for numbers from 0 to 32767. Further down in the article they explain how to expand it to all ints, but it uses operations that you're likely not allowed to use.

I would recommend writing it from scratch based on the algorithm mentioned in your edit. Here's a version in C# that rounds towards 0:

uint float_from_int(int x)
{
    if (x == 0)
        return 0; // 0 is a special case because it has no 1 bits

    // Save the sign bit of the input and take the absolute value of the input.
    uint signBit = 0;
    uint absX = (uint)x;
    if (x < 0)
    {
        signBit = 0x80000000u;
        absX = (uint)-x;
    }

    // Shift the input left until the high order bit is set to form the mantissa.
    // Form the floating exponent by subtracting the number of shifts from 158.
    uint exponent = 158;
    while ((absX & 0x80000000) == 0)
    {
        exponent--;
        absX <<= 1;
    }

    // compute mantissa
    uint mantissa = absX >> 8;

    // Assemble the float from the sign, mantissa, and exponent.
    return signBit | (exponent << 23) | (mantissa & 0x7fffff);
}

edited Sep 09 '12 at 07:11

answered Sep 09 '12 at 03:55

Gabe

84,912
12
139
238

Nice, I'm closer to the answer now! Thank you! I got an additional error which I'm working on now: "ERROR: Test float_from_int(8388608[0x800000]) failed... ...Gives 0[0x0]. Should be 1258291200[0x4b000000]" (see edit to post) – Sep 09 '12 at 04:03
I implemented your algorithm, but I get this error: "ERROR: Test float_from_int(-2147483647[0x80000001]) failed... ...Gives -822083585[0xceffffff]. Should be -822083584[0xcf000000]" I've been trying to deal with this rounding stuff for some time now. I'm at my wits end and questioning my sanity. I do appreciate you taking the time to help tho, thank you – Sep 09 '12 at 15:30
@Silver: You might want to ask another question about how to implement the rounding method you need. – Gabe Sep 09 '12 at 22:12
1

Why is the exponent set to 158? (I'd post as a new question, but it seems like it might get closed for some reason.) – JustBlossom Jul 15 '16 at 09:22
@JustBlossom 127 corrects for the exponent bias. add 31 to get to the max exponent that you can find on an int. – 7anner Sep 27 '17 at 18:51

score 1 · Answer 2 · answered Sep 09 '12 at 05:45

The basic formulation of the algorithm is to determine the sign, exponent and mantissa bits, then pack the result into an integer. Breaking it down this way makes it easy to clearly separate the tasks in code and makes solving the problem (and testing your algorithm) much easier.

The sign bit is the easiest, and getting rid of it makes finding the exponent easier. You can distinguish four cases: 0, 0x80000000, [-0x7ffffff, -1], and [1, 0x7fffffff]. The first two are special cases, and you can trivially get the sign bit in the last two cases (and the absolute value of the input). If you're going to cast to unsigned, you can get away with not special-casing 0x80000000 as I mentioned in a comment.

Next up, find the exponent -- there's an easy (and costly) looping way, and a trickier but faster way to do this. My absolute favourite page for this is Sean Anderson's bit hacks page. One of the algorithms shows a very quick loop-less way to find the log₂ of an integer in only seven operations.

Once you know the exponent, then finding the mantissa is easy. You just drop the leading one bit, then shift the result either left or right depending on the exponent's value.

If you use the fast log₂ algorithm, you can probably end up with an algorithm which uses no more than 20 operations.

Working through the ideas in the post right now, I'll update when I make progress — , Sep 09 '12 at 06:02

MSN · Answer 3 · 2012-09-09T19:22:31.833

0

Dealing with 0x80000000 is pretty easy:

int xIsNegative = 0;  
unsigned int absValOfX = x;  

if (x < 0)
{  
    xIsNegative = 1;  
    absValOfX = -(unsigned int)x;  
}

It gets rid of special casing -2147483648 since that value is representable as an unsigned value, and absValOfX should always be positive.

edited Sep 09 '12 at 19:22

answered Sep 09 '12 at 05:07

MSN

53,214
7
75
105

This typically causes overflow and undefined behavior, because the type of `-x` is `int`, and, when x is -2,147,483,648, the mathematical value of -x would be 2,147,483,648, but that cannot be represented in a signed 32-bit integer. – Eric Postpischil Sep 09 '12 at 17:47
@EricPostpischil, good point; I updated the code to take the negative of an unsigned number, which is still representable. – MSN Sep 09 '12 at 19:22
It says something about the C standard that negating a signed number is wrong, and the fix is to negate an unsigned number. – Eric Postpischil Sep 09 '12 at 20:15

How to manually (bitwise) perform (float)x?

3 Answers3

Linked