C - Serialization of the floating point numbers (floats, doubles)

Question

How to convert a floating point number into a sequence of bytes so that it can be persisted in a file? Such algorithm must be fast and highly portable. It must allow also the opposite operation, deserialization. It would be nice if only very tiny excess of bits per value (persistent space) is required.

it must be independent on the underlying architecture, e.g. it can be ARM-7, PowerPC, Microblaze, OpenRISC or just x86. — psihodelia, Nov 23 '09 at 22:05
Some questions are interesting even in case they happen to have emerged from homework. Simply banning all facts and topics on this site if they ever were the subject of anybody's homework would mean to erase half of it, I assume... — yeoman, Apr 10 '17 at 09:02

score 14 · Accepted Answer · answered Nov 23 '09 at 22:10

14

Assuming you're using mainstream compilers, floating point values in C and C++ obey the IEEE standard and when written in binary form to a file can be recovered in any other platform, provided that you write and read using the same byte endianess. So my suggestion is: pick an endianess of choice, and before writing or after reading, check if that endianess is the same as in the current platform; if not, just swap the bytes.

answered Nov 23 '09 at 22:10

Fabio Ceconello

15,819
5
38
51

according to the C99 spec, annex F, conforming implementations should define `__STDC_IEC_559__`, which in principle could be used as a compile-time check, but is useless in practice as there are issues with gcc ( http://gcc.gnu.org/c99status.html , scroll down to 'Further Issues') – Christoph Nov 23 '09 at 22:37
Compiler's don't necessarily dictate the IEEE floating point format. There are still computers which use other formats unfortunately (VAX/Alpha, IBM). But +1 ensuring you have the endianness right. – user7116 Nov 26 '09 at 02:26
1

Right, but they have to know the format used by the platform to support it in the RTL. Also, many platforms (these days especially embedded) don't have a math coprocessor, so they do dictate the format in the accompanying emulation lib. So I thought it'd be easier to refer to the compiler. – Fabio Ceconello Nov 26 '09 at 23:24
2

Isn't the case to treat those platforms that don't support the IEEE standard as exceptions, and when the (rare) version for them is needed, just do the necessary conversions only there? Here's a good article about the differences: http://www.codeproject.com/KB/applications/libnumber.aspx – Fabio Ceconello Nov 26 '09 at 23:26

score 3 · Answer 2 · edited Nov 29 '21 at 16:36

3

You could always convert to IEEE-754 format in a fixed byte order (either little endian or big endian). For most machines, that would require either nothing at all or a simple byte swap to serialize and deserialize. A machine that doesn't support IEEE-754 natively will need a converter written, but doing that with ldexp and frexp (standard C library functions)and bit shuffling is not too tough.

edited Nov 29 '21 at 16:36

Jarod42

203,559
14
181
302

answered Nov 23 '09 at 22:29

Chris Dodd

2,920
15
10

The problem comes with FP standards that lack some of the "features" of IEEE. Namely the VAX and IBM floating point formats...You're in for a world of hurt w.r.t. corner cases. Thankfully, people have written excellent converters which handle these cases gracefully (I'm looking at you USGS! I owe you a beer). – user7116 Nov 26 '09 at 02:28
An ANSI compliant frexp function hides most of that for you. Of course, you may end up with cases where serialization and deserialization gives you a (close but) different value. – Chris Dodd Nov 30 '09 at 18:35

score 3 · Answer 3 · answered Nov 24 '09 at 00:06

This might give you a good start - it packs a floating point value into an int and long long pair, which you can then serialise in the usual way.

#define FRAC_MAX 9223372036854775807LL /* 2**63 - 1 */

struct dbl_packed
{
    int exp;
    long long frac;
};

void pack(double x, struct dbl_packed *r)
{
    double xf = fabs(frexp(x, &r->exp)) - 0.5;

    if (xf < 0.0)
    {
        r->frac = 0;
        return;
    }

    r->frac = 1 + (long long)(xf * 2.0 * (FRAC_MAX - 1));

    if (x < 0.0)
        r->frac = -r->frac;
}

double unpack(const struct dbl_packed *p)
{
    double xf, x;

    if (p->frac == 0)
        return 0.0;

    xf = ((double)(llabs(p->frac) - 1) / (FRAC_MAX - 1)) / 2.0;

    x = ldexp(xf + 0.5, p->exp);

    if (p->frac < 0)
        x = -x;

    return x;
}

score 2 · Answer 4 · edited Jun 20 '20 at 09:12

What do you mean, "portable"?

For portability, remember to keep the numbers within the limits defined in the Standard: use a single number outside these limits, and there goes all portability down the drain.

double planck_time = 5.39124E-44; /* second */

5.2.4.2.2 Characteristics of floating types <float.h>

[...]
10   The values given in the following list shall be replaced by constant
     expressions with implementation-defined values [...]
11   The values given in the following list shall be replaced by constant
     expressions with implementation-defined values [...]
12   The values given in the following list shall be replaced by constant
     expressions with implementation-defined (positive) values [...]
[...]

Note the implementation-defined in all these clauses.

score 1 · Answer 5 · answered Nov 23 '09 at 21:55

Converting to an ascii representation would be the simplest, but if you need to deal with a colossal number of floats, then of course you should go binary. But this can be a tricky issue if you care about portability. Floating point numbers are represented differently in different machines.

If you don't want to use a canned library, then your float-binary serializer/deserializer will simply have to have "a contract" on where each bit lands and what it represents.

Here's a fun website to help with that: link.

score 0 · Answer 6 · answered Nov 23 '09 at 21:34

0

sprintf, fprintf ? you don't get any more portable than that.

answered Nov 23 '09 at 21:34

2

it is not effective solution, it requires much more persistent space than for the same numbers represented in RAM – psihodelia Nov 23 '09 at 21:39
1

What's not effective about it; there are potentially serious complications with trying to save the floating point numbers directly, doing it as strings is pretty much standard operating procedure. – Carl Norum Nov 23 '09 at 21:42
I would prefer smth. like an excess bit indicating an endiannes, an excess bit-or-two indicating a number of bytes per floating number. maybe some excess bits to indicate a mantisse/exponent type (e.g. IEEE754-2008) – psihodelia Nov 23 '09 at 21:50
3

Well, why don't you just do that then? – Steve Jessop Nov 23 '09 at 21:57
5

It may require more space, but it's both human readable and machine readable, endian-agnostic, and theoretically limitless with regards to the precision required. – dreamlax Nov 24 '09 at 11:22
2

More importantly @dreamlax, it is Floating Point Format agnostic. – user7116 Nov 26 '09 at 02:24
4

This looks good at first, but has an implication that can be serious (or not, depending on use): You cannot always store a float in decimal format. That means that storing in ascii decimal-format, no matter how many decimal digits you add, you cannot guarantee that the numbers read back will be the same as the numbers that were stored. – Gerasimos R Apr 04 '13 at 13:45

score 0 · Answer 7 · answered Nov 23 '09 at 21:50

What level of portability do you require? If the file is to be read on a computer with the same OS that it was generated on, than you using a binary file and just saving and restoring the bit pattern should work. Otherwise as boytheo said, ASCII is your friend.

score 0 · Answer 8 · answered Nov 24 '09 at 10:50

This version has excess of only one byte per one floating point value to indicate the endianness. But I think, it is still not very portable however.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

#define LITEND      'L'
#define BIGEND      'B'

typedef short               INT16;
typedef int                 INT32;
typedef double              vec1_t;

 typedef struct {
    FILE            *fp;
} WFILE, RFILE;

#define w_byte(c, p)    putc((c), (p)->fp)
#define r_byte(p)       getc((p)->fp)

static void w_vec1(vec1_t v1_Val, WFILE *p)
{
    INT32   i;
    char    *pc_Val;

    pc_Val = (char *)&v1_Val;

    w_byte(LITEND, p);
    for (i = 0; i<sizeof(vec1_t); i++)
    {
        w_byte(pc_Val[i], p);
    }
}


static vec1_t r_vec1(RFILE *p)
{
    INT32   i;
    vec1_t  v1_Val;
    char    c_Type,
            *pc_Val;

    pc_Val = (char *)&v1_Val;

    c_Type = r_byte(p);
    if (c_Type==LITEND)
    {
        for (i = 0; i<sizeof(vec1_t); i++)
        {
            pc_Val[i] = r_byte(p);
        }
    }
    return v1_Val;
}

int main(void)
{
    WFILE   x_FileW,
            *px_FileW = &x_FileW;
    RFILE   x_FileR,
            *px_FileR = &x_FileR;

    vec1_t  v1_Val;
    INT32   l_Val;
    char    *pc_Val = (char *)&v1_Val;
    INT32   i;

    px_FileW->fp = fopen("test.bin", "w");
    v1_Val = 1234567890.0987654321;
    printf("v1_Val before write = %.20f \n", v1_Val);
    w_vec1(v1_Val, px_FileW);
    fclose(px_FileW->fp);

    px_FileR->fp = fopen("test.bin", "r");
    v1_Val = r_vec1(px_FileR);
    printf("v1_Val after read = %.20f \n", v1_Val);
    fclose(px_FileR->fp);
    return 0;
}

It is portable only to machines sharing the same floating point format. Having been down this road, I will give you the following advice: **Standardize on Little Endian IEEE-754 and make everybody else convert to/from that if necessary**. You will be MUCH happier in the end. You will have portability through a rigid standard. — user7116, Nov 26 '09 at 02:31

Malcolm McLean · Answer 9 · 2016-08-27T12:12:32.623

Here we go.

Portable IEEE 754 serialisation / deserialisation that should work regardless of the machine's internal floating point representation.

https://github.com/MalcolmMcLean/ieee754

/*
* read a double from a stream in ieee754 format regardless of host
*  encoding.
*  fp - the stream
*  bigendian - set to if big bytes first, clear for little bytes
*              first
*
*/
double freadieee754(FILE *fp, int bigendian)
{
    unsigned char buff[8];
    int i;
    double fnorm = 0.0;
    unsigned char temp;
    int sign;
    int exponent;
    double bitval;
    int maski, mask;
    int expbits = 11;
    int significandbits = 52;
    int shift;
    double answer;

    /* read the data */
    for (i = 0; i < 8; i++)
        buff[i] = fgetc(fp);
    /* just reverse if not big-endian*/
    if (!bigendian)
    {
        for (i = 0; i < 4; i++)
        {
            temp = buff[i];
            buff[i] = buff[8 - i - 1];
            buff[8 - i - 1] = temp;
        }
    }
    sign = buff[0] & 0x80 ? -1 : 1;
    /* exponet in raw format*/
    exponent = ((buff[0] & 0x7F) << 4) | ((buff[1] & 0xF0) >> 4);

    /* read inthe mantissa. Top bit is 0.5, the successive bits half*/
    bitval = 0.5;
    maski = 1;
    mask = 0x08;
    for (i = 0; i < significandbits; i++)
    {
        if (buff[maski] & mask)
            fnorm += bitval;

        bitval /= 2.0;
        mask >>= 1;
        if (mask == 0)
        {
            mask = 0x80;
            maski++;
        }
    }
    /* handle zero specially */
    if (exponent == 0 && fnorm == 0)
        return 0.0;

    shift = exponent - ((1 << (expbits - 1)) - 1); /* exponent = shift + bias */
    /* nans have exp 1024 and non-zero mantissa */
    if (shift == 1024 && fnorm != 0)
        return sqrt(-1.0);
    /*infinity*/
    if (shift == 1024 && fnorm == 0)
    {

#ifdef INFINITY
        return sign == 1 ? INFINITY : -INFINITY;
#endif
        return  (sign * 1.0) / 0.0;
    }
    if (shift > -1023)
    {
        answer = ldexp(fnorm + 1.0, shift);
        return answer * sign;
    }
    else
    {
        /* denormalised numbers */
        if (fnorm == 0.0)
            return 0.0;
        shift = -1022;
        while (fnorm < 1.0)
        {
            fnorm *= 2;
            shift--;
        }
        answer = ldexp(fnorm, shift);
        return answer * sign;
    }
}


/*
* write a double to a stream in ieee754 format regardless of host
*  encoding.
*  x - number to write
*  fp - the stream
*  bigendian - set to write big bytes first, elee write litle bytes
*              first
*  Returns: 0 or EOF on error
*  Notes: different NaN types and negative zero not preserved.
*         if the number is too big to represent it will become infinity
*         if it is too small to represent it will become zero.
*/
int fwriteieee754(double x, FILE *fp, int bigendian)
{
    int shift;
    unsigned long sign, exp, hibits, hilong, lowlong;
    double fnorm, significand;
    int expbits = 11;
    int significandbits = 52;

    /* zero (can't handle signed zero) */
    if (x == 0)
    {
        hilong = 0;
        lowlong = 0;
        goto writedata;
    }
    /* infinity */
    if (x > DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 0;
        goto writedata;
    }
    /* -infinity */
    if (x < -DBL_MAX)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (1 << 31);
        lowlong = 0;
        goto writedata;
    }
    /* NaN - dodgy because many compilers optimise out this test, but
    *there is no portable isnan() */
    if (x != x)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 1234;
        goto writedata;
    }

    /* get the sign */
    if (x < 0) { sign = 1; fnorm = -x; }
    else { sign = 0; fnorm = x; }

    /* get the normalized form of f and track the exponent */
    shift = 0;
    while (fnorm >= 2.0) { fnorm /= 2.0; shift++; }
    while (fnorm < 1.0) { fnorm *= 2.0; shift--; }

    /* check for denormalized numbers */
    if (shift < -1022)
    {
        while (shift < -1022) { fnorm /= 2.0; shift++; }
        shift = -1023;
    }
    /* out of range. Set to infinity */
    else if (shift > 1023)
    {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (sign << 31);
        lowlong = 0;
        goto writedata;
    }
    else
        fnorm = fnorm - 1.0; /* take the significant bit off mantissa */

    /* calculate the integer form of the significand */
    /* hold it in a  double for now */

    significand = fnorm * ((1LL << significandbits) + 0.5f);


    /* get the biased exponent */
    exp = shift + ((1 << (expbits - 1)) - 1); /* shift + bias */

    /* put the data into two longs (for convenience) */
    hibits = (long)(significand / 4294967296);
    hilong = (sign << 31) | (exp << (31 - expbits)) | hibits;
    x = significand - hibits * 4294967296;
    lowlong = (unsigned long)(significand - hibits * 4294967296);

writedata:
    /* write the bytes out to the stream */
    if (bigendian)
    {
        fputc((hilong >> 24) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc(hilong & 0xFF, fp);

        fputc((lowlong >> 24) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc(lowlong & 0xFF, fp);
    }
    else
    {
        fputc(lowlong & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 24) & 0xFF, fp);

        fputc(hilong & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 24) & 0xFF, fp);
    }
    return ferror(fp);
}

Addressed. Code now in. (Link also has single precision but it follows straightforwardsly). — Malcolm McLean, Aug 27 '16 at 12:13

score -1 · Answer 10 · answered Nov 26 '09 at 02:12

-1

fwrite(), fread()? You will likely want binary, and you cannot pack the bytes any tighter unless you want to sacrifice precision which you would do in the program and then fwrite() fread() anyway; float a; double b; a=(float)b; fwrite(&a,1,sizeof(a),fp);

If you are carrying different floating point formats around they may not convert in a straight binary sense, so you may have to pick apart the bits and perform the math, this to the power that plus this, etc. IEEE 754 is a dreadful standard to use but widespread so it would minimize the effort.

answered Nov 26 '09 at 02:12

old_timer

69,149
8
89
168

The question clearly asks about a portable method, which this is obviously not. – Kevin Cox Dec 01 '14 at 21:21
"floating point" is by definition not portable, there are numerous formats and the specific format was not specified. C isnt very portable either, the question was flawed at best. – old_timer Dec 01 '14 at 21:55

C - Serialization of the floating point numbers (floats, doubles)

10 Answers10

5.2.4.2.2 Characteristics of floating types <float.h>

Linked