convert unsigned int to single precision floating point with only integer operations

Question

The idea is for this to do pretty much the same thing as this http://www.h-schmidt.net/FloatConverter/IEEE754.html

For example if i plug in 357, i should get 0x43b28000

float unsignedToFloat( unsigned int x ) {
unsigned int result = 0;

return *(float*)&result;
}

but how do I do this?

Convert ieee 754 float to hex with c - printf

I saw this, but there didn't seem to be any good solutions.

I don't understand. Your function returns a float. How can it return `0x43b28000`? Do you mean it returns `357.0`? (The first page you linked answers this question at the bottom, second question in their FAQ.) The question is meaningless. This is like asking how to convert the "two" in "two mushrooms" into the "two" in "two cars". There's nothing to do. — David Schwartz, Mar 15 '13 at 07:34
In this case, 0x43b28000 is the binary (hexadecimal) representation of a 32 bit single. I don't have code, and no time at the moment, but if I have a little time later on, I'll describe how it can be done (if no one else does). It merely involves some shifting and some adding/subtracting. — Rudy Velthuis, Mar 15 '13 at 08:15
@roger_rowland: I have the impression userXYZ wants to know how this works, in place, and not let a compiler do it for him. — Rudy Velthuis, Mar 15 '13 at 08:17
@Rudy Velthuis - well. I followed the link in the OP's post and it describes how that Java app does it, which is exactly the same as 'reinterpret_cast' works. Is that not what he was asking? — Roger Rowland, Mar 15 '13 at 08:19
Quoting from the FAQ in that link "This source code for this converter doesn't contain any low level conversion routines. The conversion between a floating point number (i.e. a 32 bit area in memory) and the bit representation isn't actually a conversion, but just a reinterpretation of the same data in memory. This can be easily done with typecasts in C/C++" — Roger Rowland, Mar 15 '13 at 08:20
Does it have to be 32 bit float? That requires rounding. All 32 bit integers are exactly representable in IEEE 754 64-bit. — Patricia Shanahan, Mar 15 '13 at 08:47
As I see it, he wants to turn an int (375) into a float with binary value 0x43b28000. Of course that is easy: just assign it. But if he wants to know how this can be done without relying on the runtime, i.e. direclty in his routine with mere integer arithmetic, he'll have to shift and count bits, etc. ISTM that the question is how this can be done. — Rudy Velthuis, Mar 15 '13 at 10:49

Stephen Canon · Answer 1 · 2013-03-15T11:45:23.950

First, if x is zero, return zero.

Next find the index of the highest-order non-zero bit in x. Call it i.

If i is less than 24, left-shift x by 23 - i to get a normalized significand. Now clear bit 23 to hide the implicit bit, and set bits 23:30 to 127 + i, which is the biased exponent. Return the result.

Otherwise, right-shift x by i - 23 to get a normalized significand via truncation, and clear the implicit bit and set the exponent as above. If your desired rounding mode is truncation or round-to-minus-infinity, you are done. Otherwise, you will need to look at the bits that were shifted off the bottom of x. If the desired rounding mode is round-to-plus-infinity and any of those bits are set, add one to the result and return. Finally, if the desired rounding mode is round-to-nearest-ties-to-even (IEEE-754 default), there are three cases:

the trailing bits are b0...: return the truncated result.
the trailing bits are b1000...: this is an exact halfway case. If we call the truncated result t, you need to return t + (t&1); i.e. round up only if t is odd.
the trailing bits are b1...1...: add one to the truncated result and return.

Rudy Velthuis · Accepted Answer · 2013-03-15T21:06:29.030

1

I did it this way:

UPDATED

This one handles 0, INT_MIN and rounding correctly, AFAICT:

#define SIGN_MASK       (1 << 31)
#define HIDDEN_MASK     (1 << 23)
#define MANTISSA_MASK   (HIDDEN_MASK - 1)

#define INT_MIN 0x80000000
#define INT_MAX 0x7FFFFFFF

float intToFloat(int n)
{
    int sign;
    int exp;
    unsigned int half;

    if (n == 0)
         return 0.0f;

    if (n == INT_MIN)
        return -(float)(INT_MIN);

    sign = n < 0 ? SIGN_MASK : 0;
    if (sign)
        n = -n;

    if (!(n & ~(HIDDEN_MASK | MANTISSA_MASK)))
        for (exp = 0; !(n & HIDDEN_MASK); n <<= 1, exp--) ;
    else
    {
        half = 0;
        for (exp = 0; n & ~(HIDDEN_MASK | MANTISSA_MASK); exp++)
        {
            half >>= 1;
            if (n & 1)
                half |= 0x80000000;
            n >>= 1;
        }
        if (half > INT_MIN || ((half == INT_MIN) && (n & 1) != 0))
        {
            n++;
            if (n == 0x1000000)
            {
                n = 0; // or 0x800000, doesn't matter.
                exp++;
            }
        }

    }

    exp = (exp + 127 + 23) << 23;
    n = (n & MANTISSA_MASK) | sign | exp;

    return *((float *)&n);
}

edited Mar 15 '13 at 21:06

answered Mar 15 '13 at 13:31

Rudy Velthuis

28,387
5
46
94

Your rounding logic is incorrect, as it re-rounds every time a bit is shifted off the bottom. This will cause severe double-rounding problems. (Consider the case where three bits are to be rounded off and they are `b011`. This should round down. Your first rounding will round up, giving `b10` and causing the final result to round the wrong way if the low bit of the truncated result is set. There are ways to do rounding a bit at a time like this, but it is much easier and simpler in software to consider the entire field to be rounded in one pass. – Stephen Canon Mar 15 '13 at 13:37
This fails when `n` is `INT_MIN`. Then `-n` overflows (in common C implementations), and `n >>= 1` causes problems. Also, why return `*((float *)&n)` (which is also not guaranteed by C) instead of simply 0? – Eric Postpischil Mar 15 '13 at 13:40
You're right about 0 and INT_MIN. I would return constants then. – Rudy Velthuis Mar 15 '13 at 14:12
@StephenCanon: you're right. I'll have to update the rounding to once, after the shift. – Rudy Velthuis Mar 15 '13 at 14:33
@StephenCanon: Try it now. I tried all values from INT_MIN to INT_MAX and found no error anymore. – Rudy Velthuis Mar 15 '13 at 16:34
He said using only integer operations. `*((float *)&n)` is not an integer operation. – David Schwartz Mar 16 '13 at 01:44
`*((float *)&n)` is, on the surface, a pointer operation. But no conversion takes place, it merely re-interprets the bits of the integer as a float. IOW, it is the same as a reinterpret_cast in C++. Since the result is declared as float, that is the way to return the integer bits as float. – Rudy Velthuis Mar 16 '13 at 15:29

convert unsigned int to single precision floating point with only integer operations

2 Answers2

UPDATED