Portable serialisation of IEEE754 floating-point values

Question

I've recently been working on a system that needs to store and load large quantities of data, including single-precision floating-point values. I decided to standardise on network byte order for integers, and also decided to store floating point values in big-endian format, i.e.:

  |-- Byte 0 --| |-- Byte 1 -|  Byte 2   Byte 3
  #      ####### #     ####### ######## ########
Sign     Exponent          Mantissa
 1b    8b, MSB first    23b, MSB first

Ideally, I want to provide functions like htonl() and ntohl(), since I have already been using these for swabbing integers, and I also want to implement this in a way that has as much platform-independence as possible (while assuming that the float type corresponds to IEEE754 32-bit floating point values). Is there some way, possibly using ieee754.h, to do this?

I have one answer that seems to work, and I will post it below, but it seems pretty slow and inefficient and I would appreciate any suggestions about how to make it faster and/or more reliable.

What about this: http://stackoverflow.com/a/2782742/1327576 ? — smocking, May 16 '12 at 14:37
I looked at that answer, and clearly it depends on the assumption that the host representation is little-endian. I'm looking for something that's host-byte-order-agnostic. — Peter T.B. Brett, May 16 '12 at 14:40
Arguably `snprintf(b, sizeof(b), "%.9001f", yourvalue)` (text-based representation) is most portable. — jørgensen, May 16 '12 at 16:33
Arguably! Unfortunately, as mentioned in the question, I'm saving and loading very large quantities of data. I started off with textual representation, as you suggest, but it was too slow to `printf` and `scanf` the billions of data items, and the resulting files were too large. But you're quite right to point this option out. :-) — Peter T.B. Brett, May 16 '12 at 16:46

Stephen Canon · Accepted Answer · 2012-05-16T15:19:50.300

8

Much simpler, and depending on the same assumption as yours (which is that float and integer types have the same byte order, and is almost universally valid -- realistically you'll never encounter a system where it isn't true):

#include <string.h>

float htonf(float val) {
    uint32_t rep;
    memcpy(&rep, &val, sizeof rep);
    rep = htonl(rep);
    memcpy(&val, &rep, sizeof rep);
    return val;
}

Any reasonably good compiler will optimize away the two memcpy calls; they are present to defeat over-eager strict aliasing optimizations, so this ends up being as efficient as htonl plus the overhead of a single function call.

edited May 16 '12 at 15:19

answered May 16 '12 at 15:10

Stephen Canon

103,815
19
183
269

Thanks for your answer. Just to make sure I understand -- is the byte ordering assumption in my answer due to using `ieee754.h`? – Peter T.B. Brett May 16 '12 at 15:14
1

Yes, `ieee754.h` assumes that -- or at least, all implementations that I've seen do (and as I said, it might as well be a universally valid assumption in our modern era). – Stephen Canon May 16 '12 at 15:17
1

Don't some ARM architectures have some weird mixed-endian craziness going on? Anyway, thanks again for your answer -- it's a lot easier to understand than mine, at the very least! – Peter T.B. Brett May 16 '12 at 15:22
@PeterBrett: there have been a number of "crazy mixed endian architectures" over the years; thankfully, everyone seems to have figured out that it's a bad idea. – Stephen Canon May 16 '12 at 15:24
Yes, some older ARM systems (the ones that used the "FPA" math coprocessor) were (normally) little-endian for integers but always big-endian for floating point. As far as I know, all modern ARM processors with hardware floating point use the VFP math unit instead, which matches integer endianness. However, GCC does still support the FPA -- possibly just because nobody's gotten around to removing it yet. – zwol May 16 '12 at 15:28
`long med = htonl(*(int*)&val);return *(float*)&(med);` wont suffice? Would you mind explaining those optimizations and how memcpy defeats them? – Agent_L May 16 '12 at 15:28
1

@Agent_L That code triggers undefined behavior. You are dereferencing a pointer which points to a value which is not of the pointer's type. C compilers are allowed to render this *arbitrarily* not as you intended -- two things I have actually seen happen are that the compiler substitutes some constant value (usually zero) for the load, or it replaces the load with a trap instruction (i.e. insta-crash when called). – zwol May 16 '12 at 15:31
@Zack That's interesting, what compilers are those? Often ppl ask how to demonstrate the dangers of UB, but it turns out to be quite well-defined in practice. – Agent_L May 16 '12 at 15:41
1

@Agent_L: There are platforms where the ABI requires stronger (or weaker) alignment for floats than for integers of the same size. On such a platform, the dereferencing a type-punned pointer will easily cause a crash. – Stephen Canon May 16 '12 at 15:43
1

@Agent_L: In addition, some compilers (such as `gcc` with `-fstrict-aliasing`) use strict aliasing assumptions to enable certain classes of optimisation, and will bitch like mad if you invoke undefined aliasing behaviour. For example, see [this answer](http://stackoverflow.com/questions/2958633/gcc-strict-aliasing-and-horror-stories). – Peter T.B. Brett May 16 '12 at 15:55
@Agent_L Thanks to aggressive pushback from prominent know-nothings, a lot of compilers have made the construct you showed do "what it's supposed to do" even though it's UB per the standard. I don't remember exactly when GCC started and stopped "mis"-compiling that; I would try versions in the 4.0 through 4.4 range and see what happens. – zwol May 16 '12 at 16:26
@Zack I'm asking because the standard keeps shouting "wolf, wolf" where there is no wolf. This only makes ppl believe that UB is nothing bad. They come to SO for demonstration of dangers of relying on UB, but it's like UFO. Everyone knows someone who saw it, but no one saw it with their own eyes. So far Stephen's point about strict float alignment is the only reasonable (yet still imaginary without pointing out exact hardware) example. – Agent_L May 22 '12 at 12:58
2

@Agent_L: I've seen it with my own eyes, causing bugs in a library I worked on. GCC 4.0 would, under the right conditions, optimize `int32_t x = *(int32_t *)&float;` into the equivalent of `int32_t x;` (i.e. `x` would simply be uninitialized). Seriously, use `memcpy`. It is just as easy, is explicitly guaranteed to be correct, and will generate identical code when using reasonable compilers. What's the downside? – Stephen Canon May 22 '12 at 14:23
1

@Agent_L: compile `float f(int *x, float *y) { float k = *y; *x = 32; return *y + k; }` with `-O2 -fstrict-aliasing` in recent GCC (confirmed with both Apple gcc 4.2 and FSF gcc 4.6) and it will optimize on the assumption `y != x`. – zwol May 22 '12 at 22:28
1

Would not byte swapping a `float` potentially form a NAN from a finite `x`, thus the calling function would need to handle the result _very_ carefully as not to trip some NAN detection? Would not is be safer for the result to be a uint32_t or a 4-byte structure call `taolf` or something less fragile? – chux - Reinstate Monica Nov 24 '13 at 05:40

score 1 · Answer 2 · answered May 16 '12 at 14:30

1

As mentioned in the question above, I have a solution to my problem, but I'm not particularly attached to it, and I welcome other answers, so I'm posting it here rather than in the question. In particular, it seems likely to be slow, and I'm not sure whether it breaks strict aliasing, among other potential problems.

#include <ieee754.h>

float
htonf (float val)
{
  union ieee754_float u;
  float v;
  uint8_t *un = (uint8_t *) &v;

  u.f = val;
  un[0] = (u.ieee.negative << 7) + ((u.ieee.exponent & 0xfe) >> 1);
  un[1] = ((u.ieee.exponent & 0x01) << 7) + ((u.ieee.mantissa & 0x7f0000) >> 16);
  un[2] = (u.ieee.mantissa & 0xff00) >> 8;
  un[3] = (u.ieee.mantissa & 0xff);
  return v;
}

float
ntohf (float val)
{
  union ieee754_float u;
  uint8_t *un = (uint8_t *) &val;

  u.ieee.negative = (un[0] & 0x80) >> 7;
  u.ieee.exponent = (un[0] & 0x7f) << 1;
  u.ieee.exponent += (un[1] & 0x80) >> 7;
  u.ieee.mantissa = (un[1] & 0x7f) << 16;
  u.ieee.mantissa += un[2] << 8;
  u.ieee.mantissa += un[3];

  return u.f;
}

answered May 16 '12 at 14:30

Peter T.B. Brett

1,250
11
20

1

Thanks for the vote of confidence. Did I mention that I am dealing with a *lot* of data? ;-) What makes it look fast to you? – Peter T.B. Brett May 16 '12 at 14:38
It doesn't use expensive operations, no loops, no jumps. And I cannot imagine how you could get away with significantly less operations. But I'm at least as curious as you to see better proposals. – undur_gongor May 16 '12 at 14:53
2

My immediate reaction is that this is probably pretty fast as well. "fast" is a relative term, so I'll clarify what I mean by that. I think the CPU will be able to convert data a lot faster than the network can transmit it. Even ignoring the network, conversion speed will probably exceed bandwidth to main memory. – Jerry Coffin May 16 '12 at 15:14
@JerryCoffin: Even if you can't make the network transfer faster, you can let the processor finish any given batch of work sooner, which lets it return to a low-power state and save energy. – Stephen Canon May 16 '12 at 17:12

score 1 · Answer 3 · answered Feb 14 '14 at 12:44

Here's a portable IEEE 754 write routine. It will write a double in IEEE 754 format, regardless of the floating point representation on the host machine.

/*
* write a double to a stream in ieee754 format regardless of host
*  encoding.
*  x - number to write
*  fp - the stream
*  bigendian - set to write big bytes first, elee write litle bytes
*              first
*  Returns: 0 or EOF on error
*  Notes: different NaN types and negative zero not preserved.
*         if the number is too big to represent it will become infinity
*         if it is too small to represent it will become zero.
*/
static int fwriteieee754(double x, FILE *fp, int bigendian)
{
    int shift;
    unsigned long sign, exp, hibits, hilong, lowlong;
    double fnorm, significand;
    int expbits = 11;
    int significandbits = 52;

    /* zero (can't handle signed zero) */
    if (x == 0)
    {
        hilong = 0;
        lowlong = 0;
        goto writedata;
    }
    /* infinity */
    if (x > DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 0;
        goto writedata;
    }
    /* -infinity */
    if (x < -DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (1 << 31);
        lowlong = 0;
        goto writedata;
    }
    /* NaN - dodgy because many compilers optimise out this test, but
    *there is no portable isnan() */
    if (x != x)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 1234;
        goto writedata;
    }

    /* get the sign */
    if (x < 0) { sign = 1; fnorm = -x; }
    else { sign = 0; fnorm = x; }

    /* get the normalized form of f and track the exponent */
    shift = 0;
    while (fnorm >= 2.0) { fnorm /= 2.0; shift++; }
    while (fnorm < 1.0) { fnorm *= 2.0; shift--; }

    /* check for denormalized numbers */
    if (shift < -1022)
    {
        while (shift < -1022) { fnorm /= 2.0; shift++; }
        shift = -1023;
    }
    /* out of range. Set to infinity */
    else if (shift > 1023)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (sign << 31);
        lowlong = 0;
        goto writedata;
    }
    else
        fnorm = fnorm - 1.0; /* take the significant bit off mantissa */

    /* calculate the integer form of the significand */
    /* hold it in a  double for now */

    significand = fnorm * ((1LL << significandbits) + 0.5f);


    /* get the biased exponent */
    exp = shift + ((1 << (expbits - 1)) - 1); /* shift + bias */

    /* put the data into two longs (for convenience) */
    hibits = (long)(significand / 4294967296);
    hilong = (sign << 31) | (exp << (31 - expbits)) | hibits;
    x = significand - hibits * 4294967296;
    lowlong = (unsigned long)(significand - hibits * 4294967296);

writedata:
    /* write the bytes out to the stream */
    if (bigendian)
    {
        fputc((hilong >> 24) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc(hilong & 0xFF, fp);

        fputc((lowlong >> 24) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc(lowlong & 0xFF, fp);
    }
    else
    {
        fputc(lowlong & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 24) & 0xFF, fp);

        fputc(hilong & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 24) & 0xFF, fp);
    }
    return ferror(fp);
}

Portable serialisation of IEEE754 floating-point values

3 Answers3

Linked