2

I have a library that saves to disk loads of floating point data in text form. It seems they've done this because of portability matters, but because of huge disk usage from this, I've written a function to save the binary representation of floating points directly to disk. I know this doesn't guarantee 100% portability, but I'll run this only on x86(_64) Linux/Windows PC's (maybe also in Mac and BSDs).

Is there a way to at least check whether the floating point format the program understands is also okay with the system? And how much of incompatibility should I expect from dealing with floating point data in binary form?

  • In a limited fashion. The `__STDC_IEC_559__` expands to `1` if IEC 60559 is supported. But if you intend to pass the file between platforms, it won't be much use to you. You'd still need to use a platform agnostic format. – StoryTeller - Unslander Monica Feb 08 '17 at 12:25
  • Difficult to say, theoretically the C standard doesn't impose the binary representation of numbers meaning that any compiler or system is free to use whatever it likes. IEEE-754 defines a binary floating point format, using such encoding should at least give enough compatibility, but is difficult to be sure that it is supported on any system. But, as point in favor, is possible to write a function to convert the format. – Frankie_C Feb 08 '17 at 12:30

4 Answers4

2

Is there a way to at least check whether the floating point format the program understands is also okay with the system?

Test 1: sizeof. Test 2: save a magic floating point value in the header of your on-disk file and check in the program that it has the right value after you've read the binary data from the disk. This should be safe enough.

And how much of incompatibility should I expect from dealing with floating point data in binary form?

Very little. If, as you're saying, you're staying with just one hardware architecture (x86), you'll be fine. If you have a limited set of supported architectures - just test all of them. On x86 everyone will be using hardware floating point which limits how creative they can be (pretty much not at all). Even between architectures everyone I know of who uses IEEE 754 floating point has the same binary representation for the same endianness.

Floating point have the weird problem that there isn't a widely used standard for their binary on disk/on wire representation. That being said, everyone who I've looked at does one of two things: either strings or store the bit pattern in an equally sized integer, adjust for endianness, brutally cast to float.

Art
  • 19,807
  • 1
  • 34
  • 60
  • Good answer in general, only one flaw: Brutally casting to float will yield undefined behavior with strict aliasing rules if you are casting a pointer to `int` or similar. There are exactly two ways around this: Only cast to/from a `char` pointer and copy the bytes, or use `memcpy()` instead of a cast. The later is easier and usually translates to better code. – cmaster - reinstate monica Feb 08 '17 at 13:42
  • 1
    @cmaster I said brutally cast, I probably meant "convert the binary pattern using whatever method you're comfortable with". I'd cast. Sure, standard says we shouldn't cast. But I usually judge those things by seeing what company I'll be in when some non-standard trick breaks. And in this case the company is pretty much everybody. The standard has very good reasons to make it undefined because it is totally unportable. On the other hand compilers have very good reasons to never break it because they gain nothing from doing it and everyone would hate them for it. – Art Feb 08 '17 at 14:01
  • 1
    1. In that case, you are not in my company. When I know something is UB, I try to avoid it. It feels just a tad too reckless not to. 2. The UB in question arises from strict aliasing rules: The compiler is free to move the write to the `int` after the load of the `float`. That is the whole point of introducing strict aliasing rules: To create more situations where the compiler is allowed to move a write/load. **Unless the compiler writers implement explicit exceptions to their optimizers to save your code from breaking, any update of the compiler may break it.** – cmaster - reinstate monica Feb 08 '17 at 14:21
2

Look up the binary portability website. https://github.com/MalcolmMcLean/ieee754

The function to write an IEEE 754 portably is quite long, but it's just a cut and paste job. There's also a float version.

/*
* write a double to a stream in ieee754 format regardless of host
*  encoding.
*  x - number to write
*  fp - the stream
*  bigendian - set to write big bytes first, elee write litle bytes
*              first
*  Returns: 0 or EOF on error
*  Notes: different NaN types and negative zero not preserved.
*         if the number is too big to represent it will become infinity
*         if it is too small to represent it will become zero.
*/
int fwriteieee754(double x, FILE *fp, int bigendian)
{
    int shift;
    unsigned long sign, exp, hibits, hilong, lowlong;
    double fnorm, significand;
    int expbits = 11;
    int significandbits = 52;

    /* zero (can't handle signed zero) */
    if (x == 0)
    {
        hilong = 0;
        lowlong = 0;
        goto writedata;
    }
    /* infinity */
    if (x > DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 0;
        goto writedata;
    }
    /* -infinity */
    if (x < -DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (1 << 31);
        lowlong = 0;
        goto writedata;
    }
    /* NaN - dodgy because many compilers optimise out this test, but
    *there is no portable isnan() */
    if (x != x)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 1234;
        goto writedata;
    }

    /* get the sign */
    if (x < 0) { sign = 1; fnorm = -x; }
    else { sign = 0; fnorm = x; }

    /* get the normalized form of f and track the exponent */
    shift = 0;
    while (fnorm >= 2.0) { fnorm /= 2.0; shift++; }
    while (fnorm < 1.0) { fnorm *= 2.0; shift--; }

    /* check for denormalized numbers */
    if (shift < -1022)
    {
        while (shift < -1022) { fnorm /= 2.0; shift++; }
        shift = -1023;
    }
    /* out of range. Set to infinity */
    else if (shift > 1023)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (sign << 31);
        lowlong = 0;
        goto writedata;
    }
    else
        fnorm = fnorm - 1.0; /* take the significant bit off mantissa */

    /* calculate the integer form of the significand */
    /* hold it in a  double for now */

    significand = fnorm * ((1LL << significandbits) + 0.5f);


    /* get the biased exponent */
    exp = shift + ((1 << (expbits - 1)) - 1); /* shift + bias */

    /* put the data into two longs (for convenience) */
    hibits = (long)(significand / 4294967296);
    hilong = (sign << 31) | (exp << (31 - expbits)) | hibits;
    x = significand - hibits * 4294967296;
    lowlong = (unsigned long)(significand - hibits * 4294967296);

writedata:
    /* write the bytes out to the stream */
    if (bigendian)
    {
        fputc((hilong >> 24) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc(hilong & 0xFF, fp);

        fputc((lowlong >> 24) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc(lowlong & 0xFF, fp);
    }
    else
    {
        fputc(lowlong & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 24) & 0xFF, fp);

        fputc(hilong & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 24) & 0xFF, fp);
    }
    return ferror(fp);
}
Stargateur
  • 24,473
  • 8
  • 65
  • 91
Malcolm McLean
  • 6,258
  • 1
  • 17
  • 18
1

You can look at the new(C11) and old macros in the header <float.h>, page 46: 5.2.4.2.2 Characteristics of floating types.

Stargateur
  • 24,473
  • 8
  • 65
  • 91
  • In other words, even if you don't have this (since you're not in C11) you can re-implement the same functionality and tag the file with the relevant values, thus describing the meaning of the bits and making it possible to be compatible even on hardware with a non-matching format. You could of course also make the header a single well-defined float, and check for that (I suggest π). – unwind Feb 08 '17 at 12:28
  • Which particular macro are you referring to (or did you mean "macros"?), and what do you mean by "new"? `DBL_EPSILON`, `DBL_MAX`, `DBL_MIN` have been around since C89. – Mark Dickinson Feb 08 '17 at 13:49
  • @MarkDickinson This is the only thing that I found in the standard about floating number, there are not solution at this problem in the standard but it's a start. "— additional floating-point characteristics in ", C11. – Stargateur Feb 08 '17 at 14:08
  • Okay, thanks. Yes, C11 added some extra things, like `DBL_TRUE_MIN` and `DBL_HAS_SUBNORM`. – Mark Dickinson Feb 08 '17 at 14:14
-1

In general you should be fine to directly read and write the binary data: the IEEE754 binary interchange format is pretty much standard outside of a few niche areas. You can use the __STDC_IEC_559__ macro to check.

As noted in this question, one thing the spec does not specify is the precise mapping of bits to bytes, so there is potential for endianness issues (though probably not if you're exclusively using x86/x86_64). It might be a good idea to include a check floating point value at the start of your stream (note that it is not sufficient to check the endianness of your integers, as it is technically possible to have different endianness for integer and floating point).

If you're writing text, one alternative to consider is the hex float format, which can be much faster to read/write than decimal formats (though not as fast as the raw binary interchange format). Unfortunately, though it is part of both the IEEE and C-99 spec, it has been poorly supported by the MSVC compiler (though this may change now it is part of C++).

Community
  • 1
  • 1
Simon Byrne
  • 7,694
  • 1
  • 26
  • 50