2

Background:
I am playing around with bit-level coding (this is not homework - just curious). I found a lot of good material online and in a book called Hacker's Delight, but I am having trouble with one of the online problems.

It asks to convert an integer to a float. I used the following links as reference to work through the problem:

How to manually (bitwise) perform (float)x?
How to convert an unsigned int to a float?
http://locklessinc.com/articles/i2f/

Problem and Question:
I thought I understood the process well enough (I tried to document the process in the comments), but when I test it, I don't understand the output.

Test Cases:
float_i2f(2) returns 1073741824
float_i2f(3) returns 1077936128

I expected to see something like 2.0000 and 3.0000.

Did I mess up the conversion somewhere? I thought maybe this was a memory address, so I was thinking maybe I missed something in the conversion step needed to access the actual number? Or maybe I am printing it incorrectly? I am printing my output like this:

printf("Float_i2f ( %d ): ", 3);
printf("%u", float_i2f(3));
printf("\n");

But I thought that printing method was fine for unsigned values in C (I'm used to programming in Java).

Thanks for any advice.

Code:

/*
    * float_i2f - Return bit-level equivalent of expression (float) x
    *   Result is returned as unsigned int, but
    *   it is to be interpreted as the bit-level representation of a
    *   single-precision floating point values.
    *   Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
    *   Max ops: 30
    *   Rating: 4
    */
    unsigned float_i2f(int x) {
        if (x == 0){
            return 0;
        }

        //save the sign bit for later and get the asolute value of x
        //the absolute value is needed to shift bits to put them
        //into the appropriate position for the float
        unsigned int signBit = 0;
        unsigned int absVal = (unsigned int)x;

        if (x < 0){
            signBit = 0x80000000;
            absVal = (unsigned int)-x;
        }

        //Calculate the exponent
        // Shift the input left until the high order bit is set to form the mantissa.
        // Form the floating exponent by subtracting the number of shifts from 158.
        unsigned int exponent = 158; //158 possibly because of place in byte range

        while ((absVal & 0x80000000) == 0){//this checks for 0 or 1. when it reaches 1, the loop breaks
            exponent--;
            absVal <<= 1;
        }

        //find the mantissa (bit shift to the right)
        unsigned int mantissa = absVal >> 8;

        //place the exponent bits in the right place
        exponent = exponent << 23;

        //get the mantissa
        mantissa = mantissa & 0x7fffff;

        //return the reconstructed float
        return signBit | exponent | mantissa;
    }
Community
  • 1
  • 1
JustBlossom
  • 1,259
  • 3
  • 24
  • 53
  • 2
    You should be using `%f` format specifier to tell `printf` to interpret the value as a float value. By using `%u`, you have asked it to print an unsigned integer. However, this could be undefined behaviour due to strict aliasing, and the way variable arguments are passed to the function. You may be better creating a `float` variable, and using `memcpy` to copy the resulting integer bits directly into the float. Endianness will still be a problem. How deep do you wanna go? – paddy Nov 28 '16 at 03:06
  • 2
    It looks correct (I didn't go though your calcs). What you are looking at is the unsigned integer *equivalent* of the bits that make up the IEEE-754 single-precision floating point number. You can create a simple `union` of `float` and `uint32_t` and examine the output of both to confirm. – David C. Rankin Nov 28 '16 at 03:06
  • 1
    Your code is ok, although it doesn't round, only truncates. – deamentiaemundi Nov 28 '16 at 03:07
  • @deamentiaemundi Thanks! I'm going to go back to that. I wanted to get the truncation part working before I tried to tackle rounding. – JustBlossom Nov 28 '16 at 04:12
  • @paddy Just one or two decimal places. I never thought of Endianness being a problem. From what I read about that, if I restrict the values that I allow someone to enter for the conversion to where I always know how much memory is being used, then that shouldn't be a problem. That seems like an ugly solution to the problem though. I'll read more into it because I know talking about it in this thread is a bit off topic. But thanks! You've given me something to think about. – JustBlossom Nov 28 '16 at 04:19
  • The code is very problematic, as you rely on `unsigned int` having 32 bits. Use fixed-with types! And ensure your `float` uses IEEE754 encoding.. – too honest for this site Nov 28 '16 at 04:31

2 Answers2

3

Continuing from the comment. Your code is correct, and you are simply looking at the equivalent unsigned integer made up by the bits in your IEEE-754 single-precision floating point number. The IEEE-754 single-precision number format (made up of the sign, extended exponent, and mantissa), can be interpreted as a float, or those same bits can be interpreted as an unsigned integer (just the number that is made up by the 32-bits). You are outputting the unsigned equivalent for the floating point number.

You can confirm with a simple union. For example:

#include <stdio.h>
#include <stdint.h>

typedef union {
    uint32_t u;
    float f;
} u2f;

int main (void) {

    u2f tmp = { .f = 2.0 };
    printf ("\n u : %u\n f : %f\n", tmp.u, tmp.f);

    return 0;
}

Example Usage/Output

$ ./bin/unionuf

 u : 1073741824
 f : 2.000000

Let me know if you have any further questions. It's good to see that your study resulted in the correct floating point conversion. (also note the second comment regarding truncation/rounding)

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Thanks a bunch. I have to do a bit more reading, but I think I get it. For lack of a better way to put it, floats and unsigned ints tell two different stories. So, they have two different specifications that describe them. This also means that the bit level representation of each will be different. So, when printing them out, we are telling C which specification to use. I was telling the program to use the unsigned version. Also, I'll make sure to go back to the rounding piece. This took a while to understand, so I am trying to take everything one step at a time. – JustBlossom Nov 28 '16 at 04:11
  • Yes, you have it. 32-bits are just 32-bits. If you look at them through the `float` window, (e.g. as a *sign-bit*, followed by 8-bits of *exponent* and 23-bits of *mantissa*) you will see what they represent as a float, when you look through the `unsigned` window, (considering bits 0-31 as a whole) you get what the `unsigned` value for those bits would be. Either way, they are the same bits. It's just how floats are represented verses what we call integers. – David C. Rankin Nov 28 '16 at 04:30
0

I'll just chime in here, because nothing specifically about endianness has been addressed. So let's talk about it.

  1. The construction of the value in the original question was endianness-agnostic, using shifts and other bitwise operations. This means that regardless of whether your system is big- or little-endian, the actual value will be the same. The difference will be its byte order in memory.

  2. The generally accepted convention for IEEE-754 is that the byte order is big-endian (although I believe there is no formal specification of this, and therefore no requirement on implementations to follow it). This means if you want to directly interpret your integer value as a float, it needs to be laid out in big-endian byte order.

So, you can use this approach combined with a union if and only if you know that the endianness of floats and integers on your system is the same.

On the common Intel-based architectures this is not okay. On those architectures, integers are little-endian and floats are big-endian. You need to convert your value to big-endian. A simple approach to this is to repack its bytes even if they are already big-endian:

uint32_t n = float_i2f( input_val );
uint8_t char bytes[4] = {
    (uint8_t)((n >> 24) & 0xff),
    (uint8_t)((n >> 16) & 0xff),
    (uint8_t)((n >> 8) & 0xff),
    (uint8_t)(n & 0xff)
};
float fval;
memcpy( &fval, bytes, sizeof(float) );

I'll stress that you only need to worry about this if you are trying to reinterpret your integer representation as a float or the other way round.

If you're only trying to output what the representation is in bits, then you don't need to worry. You can just display your integer in a useful form such as hex:

printf( "0x%08x\n", n );
paddy
  • 60,864
  • 6
  • 61
  • 103