Portable way to serialize float as 32-bit integer

Question

I have been struggling with finding a portable way to serialize 32-bit float variables in C and C++ to be sent to and from microcontrollers. I want the format to be well-defined enough so that serialization/de-serialization can be done from other languages as well without too much effort. Related questions are:

Portability of binary serialization of double/float type in C++

Serialize double and float with C

c++ portable conversion of long to double

I know that in most cases a ~~typecast~~ union/memcpy will work just fine because the float representation is the same, but I would prefer to have a bit more control and piece of mind. What I came up with so far is the following:

void serialize_float32(uint8_t* buffer, float number, int32_t *index) {
    int e = 0;
    float sig = frexpf(number, &e);
    float sig_abs = fabsf(sig);
    uint32_t sig_i = 0;

    if (sig_abs >= 0.5) {
        sig_i = (uint32_t)((sig_abs - 0.5f) * 2.0f * 8388608.0f);
        e += 126;
    }

    uint32_t res = ((e & 0xFF) << 23) | (sig_i & 0x7FFFFF);
    if (sig < 0) {
        res |= 1 << 31;
    }

    buffer[(*index)++] = (res >> 24) & 0xFF;
    buffer[(*index)++] = (res >> 16) & 0xFF;
    buffer[(*index)++] = (res >> 8) & 0xFF;
    buffer[(*index)++] = res & 0xFF;
}

and

float deserialize_float32(const uint8_t *buffer, int32_t *index) {
    uint32_t res = ((uint32_t) buffer[*index]) << 24 |
                ((uint32_t) buffer[*index + 1]) << 16 |
                ((uint32_t) buffer[*index + 2]) << 8 |
                ((uint32_t) buffer[*index + 3]);
    *index += 4;

    int e = (res >> 23) & 0xFF;
    uint32_t sig_i = res & 0x7FFFFF;
    bool neg = res & (1 << 31);

    float sig = 0.0;
    if (e != 0 || sig_i != 0) {
        sig = (float)sig_i / (8388608.0 * 2.0) + 0.5;
        e -= 126;
    }

    if (neg) {
        sig = -sig;
    }

    return ldexpf(sig, e);
}

The frexp and ldexp functions seem to be made for this purpose, but in case they aren't available I tried to implement them manually as well using functions that are common:

float frexpf_slow(float f, int *e) {
    if (f == 0.0) {
        *e = 0;
        return 0.0;
    }

    *e = ceil(log2f(fabsf(f)));
    float res = f / powf(2.0, (float)*e);

    // Make sure that the magnitude stays below 1 so that no overflow occurs
    // during serialization. This seems to be required after doing some manual
    // testing.

    if (res >= 1.0) {
        res -= 0.5;
        *e += 1;
    }

    if (res <= -1.0) {
        res += 0.5;
        *e += 1;
    }

    return res;
}

and

float ldexpf_slow(float f, int e) {
    return f * powf(2.0, (float)e);
}

One thing I have been considering is whether to use 8388608 (2^23) or 8388607 (2^23 - 1) as the multiplier. The documentation says that frexp returns values that are less than 1 in magnitude, and after some experimentation it seems that 8388608 gives results that are bit-accurate with actual floats and I could not find any corner case where this overflows. That might not be true with a different compiler/system though. If this can become a problem a smaller multiplier which reduces the accuracy a bit is fine with me as well. I know that this does not handle Inf or NaN, but for now that is not a requirement.

So, finally, my question is: Does this look like a reasonable approach, or am I just making a complicated solution that still has portability issues?

Short answer: You can't really do that in a portable way, unless using a de-/serialization library/tool like e.g google protobuf. — πάντα ῥεῖ, Nov 04 '16 at 06:37
So what is the problem with the approach that I presented? What I usually read is that the problem is that the float representation cannot be guaranteed to be the same on all systems, so my attempt aims to generate something that always is the same regardless what the internal representation of float is. — Benjamin Vedder, Nov 04 '16 at 06:52
As far as I know, bitshifts are safe with endianess: http://stackoverflow.com/questions/7184789/does-bit-shift-depend-on-endianness I'm not using any typecasts. — Benjamin Vedder, Nov 04 '16 at 06:59
@rici could it be that if frexp returns something close enough to 1 the multiplications actually rounds upwards? — Benjamin Vedder, Nov 04 '16 at 07:50
If all of the microcontrollers use IEEE754 then just send the float ? — M.M, Nov 04 '16 at 08:19
You might be interested in Python's solution to this: see the `_PyFloat_Unpack4` and `_PyFloat_Pack4` C functions [here](https://github.com/python/cpython/blob/c30098c8c6014f3340a369a31df9c74bdbacc269/Objects/floatobject.c#L2409). The code is well-exercised and well-tested and covers both IEEE 754 format and other formats, including all the corner cases. — Mark Dickinson, Nov 04 '16 at 08:59
N.B. You *really* want to use `ldexpf` rather than `powf` if possible (e.g., if `FLT_RADIX` is `2`). `powf` is a complicated beast that may well not produce exactly correct values even for powers of `2`; `ldexp` is much less likely to produce non-correctly-rounded results (and will probably be faster, too). — Mark Dickinson, Nov 04 '16 at 09:52

2501 · Answer 1 · 2016-11-04T10:05:16.257

7

Assuming the float is in IEEE 754 format, extracting the mantissa, exponent and sign, is completely portable:

uint32_t internal;
float value = //...some value
memcpy( &internal , &value , sizeof( value ) );

const uint32_t sign =     ( internal >> 31u ) & 0x1u;
const uint32_t mantissa = ( internal >> 0u  ) & 0x7FFFFFu;
const uint32_t exponent = ( internal >> 23u ) & 0xFFu;

Invert the procedure to construct the float.

If you want to send the entire float only, then just copy it to the buffer. This will work even if float is not IEEE 754, but it must be 32 bit and the endianess of both integer and floating point types must be the same:

buffer[0] = ( internal >> 0u  ) & 0xFFu;
buffer[1] = ( internal >> 8u  ) & 0xFFu;
buffer[2] = ( internal >> 16u ) & 0xFFu;
buffer[3] = ( internal >> 24u ) & 0xFFu;

edited Nov 04 '16 at 10:05

answered Nov 04 '16 at 07:05

2501

25,460
4
47
87

If I'm assuming that I don't need to extract them at all, then I can just do the typecast right away. – Benjamin Vedder Nov 04 '16 at 07:12
@BenjaminVedder What do you mean? – 2501 Nov 04 '16 at 07:16
Then I could have done that: uint32_t internal; float value = //...some value memcpy( &internal , &value , sizeof( value ) ); buffer[(*index)++] = (internal >> 24) & 0xFF; buffer[(*index)++] = (internal >> 16) & 0xFF; buffer[(*index)++] = (internal >> 8) & 0xFF; buffer[(*index)++] = internal & 0xFF; The whole point of making it this complicated is that I want to be able to deal with cases with non-standard float representations, but maybe that is not an issue in 2016 in practice. (edit: sorry about the format, the comment does not seem to support newline) – Benjamin Vedder Nov 04 '16 at 07:30
@2501: can you support your claim with references from the C Standard? I doubt you can assume the same endianness for `utin32_t` and `float` as you seem to imply. – chqrlie Nov 04 '16 at 09:50
@chqrlie Standard doesn't know what endianess is, this is an artifact of the hardware. I think there exists no modern machine that supports 754 and can handle both little and big endian at the same time for different register types. – 2501 Nov 04 '16 at 09:57

chqrlie · Accepted Answer · 2016-11-04T07:42:38.477

6

You seem to have a bug in serialize_float: the last 4 lines should read:

buffer[(*index)++] = (res >> 24) & 0xFF;
buffer[(*index)++] = (res >> 16) & 0xFF;
buffer[(*index)++] = (res >> 8) & 0xFF;
buffer[(*index)++] = res & 0xFF;

Your method might not work correctly for infinities and/or NaNs because of the offset by 126 instead of 128. Note that you can validate it by extensive testing: there are only 4 billion values, trying all possibilities should not take very long.

The actual representation in memory of float values may differ on different architectures, but IEEE 854 (or more precisely IEC 60559) is largely prevalent today. You can verify if your particular targets are compliant or not by checking if __STDC_IEC_559__ is defined. Note however that even if you can assume IEEE 854, you must handle potentially different endianness between the systems. You cannot assume the endianness of floats to be the same as that of integers for the same platform.

Note also that the simple cast would be incorrect: uint32_t res = *(uint32_t *)&number; violates the strict aliasing rule. You should either use a union or use memcpy(&res, &number, sizeof(res));

edited Nov 04 '16 at 07:42

answered Nov 04 '16 at 07:18

chqrlie

131,814
10
121
189

Thanks! It was a copy-paste error. I actually have an extra function for doing that, but for the question I put it in the same to make it easier to see what is going on. – Benjamin Vedder Nov 04 '16 at 07:23
Good point about the loop! I actually did that over night with everything except Inf and NaN, and on my laptop it seems to work fine for all values. I don't know about other systems though. – Benjamin Vedder Nov 04 '16 at 07:46
1

The whole point of shifting is that it completely avoids endianess. – 2501 Nov 04 '16 at 07:57
@2501: of course. I am just telling the OP that endianness must be taken into account if he was using a simpler serialization method, if he could assume the `float` representation to be IEEE 854. – chqrlie Nov 04 '16 at 08:15
A small note: accessing a not-most-recently-written-to member of a union is also [UB](http://en.cppreference.com/w/cpp/language/union). – Rostislav Nov 04 '16 at 12:52
@Rostislav: I'm afraid it is perfectly defined in this case: http://stackoverflow.com/questions/11639947/is-type-punning-through-a-union-unspecified-in-c99-and-has-it-become-specified/36705613#36705613 – chqrlie Nov 05 '16 at 15:55
1

Well, perhaps this is a difference between C and C++ standards. C++ standard 9.3 states `[ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (9.2), and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see 9.2. — end note ]`. In this case, there's no common initial sequence. – Rostislav Nov 07 '16 at 11:37

Portable way to serialize float as 32-bit integer

2 Answers2

Linked