Storing ints as floats

Question

Suppose I have an API that only allows me to store floats, or arrays of floats. However, I would like to be storing integer values here.

I (roughly) understand that I am pretty okay with a straight cast up to around 2^23, but what if I want to go higher? Is there any way that I can take advantage of more of the 32 bits of a float and be sure I will get the same number back?

For clarification:

I'm doing some operations on point clouds with Pixar's PRMan (ie. RenderMan). I can write in either C or C++ linking against the precompiled point cloud API. PRMan at no point has to use these ints I am storing; I only need it to hand them back to me intact after operating on other data attached to the points.

It's also important to note which platform you're on to know how floats are implemented. But it's probably a *really* safe bet that you're using IEEE floats. — John Dibling, Nov 11 '10 at 00:30

Oliver Charlesworth · Accepted Answer · 2010-11-11T10:50:59.563

9

Questionable:

In C, you can do the following, which is potentially unsafe (due to strict-aliasing rules):

int i = XXX;
float f;
*(int *)&f = i;

and which relies on the assumption that sizeof(int) == sizeof(float).

Less questionable:

Safer, but more longwinded, is the following:

int i = XXX;
float f;
memcpy(&f, &i, sizeof(int));

This still relies on matching data-type sizes. However, both of the above make the assumption that the internals of the library you're using will do nothing at all to the data. For instance, it won't have any special handling for NaN, or +/-infinity, etc.

Safe:

Along an entirely different train of thought, if you're happy to waste two floats per int, you could do something like:

int i = XXX;
float f[2] = { (i & 0xFFFF), ((unsigned)i >> 16 };

This last one is safe (other than some pretty reasonable assumptions on the size of floats and ints).

edited Nov 11 '10 at 10:50

answered Nov 10 '10 at 23:13

Oliver Charlesworth

267,707
33
569
680

1

Wouldn't you just use i >> 16 for f[1]? – MSN Nov 11 '10 at 00:33
Some float bit patterns are reserved and can not be used. So though you may be able to write them over the top of a float variable in memory passing them to some form of storage may break the storage container in unpredictable ways. We have no information about the storage and can thus not guarantee this will work. – Martin York Nov 11 '10 at 01:09
@Martin: Yes, I've alluded to this in my answer. – Oliver Charlesworth Nov 11 '10 at 08:16
@MSN: Because right-shifting negative values is implementation-defined. – Oliver Charlesworth Nov 11 '10 at 08:18
1

This method is wrong, because it is ambigous. For example, numbers 32768 and -32768 both stored as {32768.0, 0.0}. The reason is that integer division is rounded toward 0, not -inf. – Vovanium Nov 11 '10 at 10:43
If I use the "safe" method, should I be adding constant < 1 to all of the stored floats before casting them back to ints to compensate for any possible impreciseness in the float putting it ever so slightly less than the integer that I meant to have stored? Ie. will 12344.9999 be cast to 12345, or should I add 0.5 to it? – Mike Boers Nov 11 '10 at 14:29
I'll write a tool which tries these methods, along with a few others, in an attempt to assert what will survive through the API. I'll edit my answer afterwards for completeness. =] Thanks! – Mike Boers Nov 11 '10 at 14:31
@Mike: There should be no imprecision (if there is, something's gone wrong!). Both `(i & 0xFFFF)` and `((unsigned)i >> 16)` should be restricted to a 16-bit range, and should fit comfortably within the mantissa of a `float`; i.e. they should both be exactly representable. – Oliver Charlesworth Nov 11 '10 at 14:58
@Oli: Having finally given a very thorough read to "What every CS should know about floats", I finally understand that. Thanks! – Mike Boers Nov 11 '10 at 15:16
Okay... So we actually completely changed the pipeline and I don't have to do this anymore! If I ever do get around to testing this on the real API I will post my results. Have a checkmark! – Mike Boers Nov 12 '10 at 13:04

score 4 · Answer 2 · answered Nov 11 '10 at 00:10

The mantissa field lets you store 23 bits. The exponent field lets you store almost 8 bits, it is 8 bits wide with a few values reserved. And there's a sign bit.

Avoiding the reserved values in the exponent, you can still store 31 bits of your choice.

You may find frexp and ldexp useful.

score 0 · Answer 3 · answered Nov 11 '10 at 00:32

All of the answers given here assume you only want to use the bytes reserved for float storage as a place to store an int. They will not allow you to perform arithmetic on the int-encoded-in-float values. If you want arithmetic to work, you're stuck with 24.99 bits (i.e. a range of -(2^24-1) to (2^24-1); I consider the sign bit as 0.99 bits rather than 1 bit because you can't store the lowest-possible value with float's sign/magnitude representation) and a sparse set of larger values (e.g. any 32-bit integer multiple of 256 is representable in float).

That should be 24 bits, no? Single-precision has 1 sign bit and 23 mantissa bits. — Oliver Charlesworth, Nov 11 '10 at 08:31

Storing ints as floats

3 Answers3

Linked