13

How can I serialize doubles and floats in C?

I have the following code for serializing shorts, ints, and chars.

unsigned char * serialize_char(unsigned char *buffer, char value)
{
    buffer[0] = value;
    return buffer + 1;
}

unsigned char * serialize_int(unsigned char *buffer, int value)
{
    buffer[0] = value >> 24;
    buffer[1] = value >> 16;
    buffer[2] = value >> 8;
    buffer[3] = value;
    return buffer + 4;
}

unsigned char * serialize_short(unsigned char *buffer, short value)
{
    buffer[0] = value >> 8;
    buffer[1] = value;
    return buffer + 2;
}

Edit:

I found these functions from this question

Edit 2:

The purpose of serializing is to send data to a UDP socket and guarantee that it can be deserialized on the other machine even if the endianness is different. Are there any other "best practices" to perform this functionality given that I have to serialize ints, doubles, floats, and char*?

Community
  • 1
  • 1
Trevor
  • 6,659
  • 5
  • 35
  • 68
  • 2
    This seems pointless - you end up with a buffer containing the number that is the same size as the number. What do you think these functions achieve? Why not use memcpy(), for example? –  Aug 05 '10 at 19:22
  • 7
    @Neil Butterworth His functions are independant of the host endian. – nos Aug 05 '10 at 19:27
  • @Neil Butterworth Yes they are, the CODE determines that the most significant byte is written at the lowest address, not the host endianness – S.C. Madsen Aug 05 '10 at 19:30
  • @nos Sorry, yes you are right. Assuming the same sized types at both ends, of course. –  Aug 05 '10 at 19:32
  • 1
    Well they try to be, but it's assuming ints are int32_t's etc. and more care should be taken regarding right shifts of signed ints. – nos Aug 05 '10 at 19:33
  • 1
    Give him credit for realizing he needs to do this at all.... I've seen to many "file formats" defined without concern for byte order or sizeof(int) to count. Since he's assigning just the least byte of the right shifted result to an unsigned char it should all work out. He's screwed if `sizeof(int)==2` on some platform though. – RBerteig Aug 05 '10 at 20:48

9 Answers9

13

The portable way: use frexp to serialize (convert to integer mantissa and exponent) and ldexp to deserialize.

The simple way: assume in 2010 any machine you care about uses IEEE float, declare a union with a float element and a uint32_t element, and use your integer serialization code to serialize the float.

The binary-file-haters way: serialize everything as text, floats included. Use the "%a" printf format specifier to get a hex float, which is always expressed exactly (provided you don't limit the precision with something like "%.4a") and not subject to rounding errors. You can read these back with strtod or any of the scanf family of functions.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • `%a` isn't in C89, but is in C99. Notably, C99 also handles NaN and infinities better by specifying how `printf` formats them and `scanf` reads them. – RBerteig Aug 10 '10 at 21:22
  • Good point. If you need C89 compatibility, just write your own `printf("%a", f)` code. It only takes about 20 lines if you don't need support for non-finite arguments, and 10-15 more if you do. Unlike printing floating point numbers in decimal, printing them in hex is very easy and the trivial implementation does what you expect (i.e. it actually works). – R.. GitHub STOP HELPING ICE Aug 11 '10 at 04:52
  • `frexp` returns the integer exponent, but how does one get the mantissa as an integer? – jw013 Sep 05 '12 at 20:20
  • 1
    `frexp` returns the mantissa in the range [1,2). So just scale by 2^23 or 2^52 and then cast to the appropriate integer type. – R.. GitHub STOP HELPING ICE Sep 05 '12 at 22:08
  • 1
    @R.. Actually, I believe it's in the range [0.5,1), is it not? – Alexis King Jul 30 '13 at 06:38
11

I remember first seeing the cast used in my example below in the good old Quake source code of the "rsqrt" routine, containing the coolest comment I'd seen at the time (Google it, you'll like it)

unsigned char * serialize_float(unsigned char *buffer, float value) 
{ 
    unsigned int ivalue = *((unsigned int*)&value); // warning assumes 32-bit "unsigned int"
    buffer[0] = ivalue >> 24;  
    buffer[1] = ivalue >> 16;  
    buffer[2] = ivalue >> 8;  
    buffer[3] = ivalue;  
    return buffer + 4; 
} 

I hope I've understood your question (and example code) correctly. Let me know if this was usefull?

S.C. Madsen
  • 5,100
  • 5
  • 32
  • 50
  • 2
    I would add `char assumes_sz_float_eq_sz_int[(2*(int)(sizeof(int)==sizeof(float)))-1];` at the top of the function. – David X Aug 05 '10 at 23:37
  • @David X As a compile-time check? Good idea, I usually do that trick with enums, but i guess a negative array length works equally well – S.C. Madsen Aug 06 '10 at 04:39
  • @SCMadden, that would be `enum{foo=0,bar=(condition)};`, right? That actually might be better, although you get an extra namespace pollutant with `foo`. (Also, the cast to `int` above is useless, i think i though the comparison was returning a `size_t` for some reason.) – David X Aug 07 '10 at 21:19
  • @David X I use enum{bar=1/(condition)}; And if its placed inside the .c/.cpp file I don't think it pollutes any namespaces. I thought sizeof returned size_t too... – S.C. Madsen Aug 08 '10 at 06:10
  • @SCMadsen, yeah, that looks like the best option. `sizeof` does return `size_t`, but for some reason i thought the operator `==` on two `size_t`s would return a `size_t`, but it returns a `int`, so the cast to int is pointless. – David X Aug 08 '10 at 22:30
  • 3
    For those of you looking for the comment, see [this](http://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code) Wikipedia link – thegreendroid Jan 10 '13 at 02:27
  • @thegreendroid Yup, that's it... ah the memories :-) – S.C. Madsen Jan 31 '13 at 09:18
  • 1
    This is an example of "type punning" and it's not super safe. Lots of examples on SO, e.g. http://stackoverflow.com/questions/222266/safely-punning-char-to-double-in-c – Cuadue Jul 28 '14 at 18:27
  • **Note:** Casting and dereferencing non-compatible type is *strict aliasing violation*. This code is not compatible with standard C, and will likely require use of compiler specific options to make it reliable (which makes it less portable). "Good" *old* Quake rsqrt had the same problem. – user694733 Mar 13 '18 at 11:43
  • @user694733: Yes I've heard that criticism before, but never once encountered a compiler or CPU-type where the code didn't behave as I assumed. Secondly: The criticism is yet to be followed by a (in my view) better solution. – S.C. Madsen Oct 19 '18 at 06:56
  • 2
    *"but never once encountered a compiler"* Alias based optimizations have a bad habit of generating seemingly working code *most* of the time. Problem is that there are no guarantees. *"criticism is yet to be followed by a (in my view) better solution."* Type punning through union, or copy with `memcpy` is well defined within standard. – user694733 Oct 19 '18 at 07:14
8

This packs a floating point value into an int and long long pair, which you can then serialise with your other functions. The unpack() function is used to deserialise.

The pair of numbers represent the exponent and fractional part of the number respectively.

#define FRAC_MAX 9223372036854775807LL /* 2**63 - 1 */

struct dbl_packed
{
    int exp;
    long long frac;
};

void pack(double x, struct dbl_packed *r)
{
    double xf = fabs(frexp(x, &r->exp)) - 0.5;

    if (xf < 0.0)
    {
        r->frac = 0;
        return;
    }

    r->frac = 1 + (long long)(xf * 2.0 * (FRAC_MAX - 1));

    if (x < 0.0)
        r->frac = -r->frac;
}

double unpack(const struct dbl_packed *p)
{
    double xf, x;

    if (p->frac == 0)
        return 0.0;

    xf = ((double)(llabs(p->frac) - 1) / (FRAC_MAX - 1)) / 2.0;

    x = ldexp(xf + 0.5, p->exp);

    if (p->frac < 0)
        x = -x;

    return x;
}
caf
  • 233,326
  • 40
  • 323
  • 462
6

You can portably serialize in IEEE-754 regardless of the native representation:

int fwriteieee754(double x, FILE * fp, int bigendian)
{
    int                     shift;
    unsigned long           sign, exp, hibits, hilong, lowlong;
    double                  fnorm, significand;
    int                     expbits = 11;
    int                     significandbits = 52;

    /* zero (can't handle signed zero) */
    if(x == 0) {
        hilong = 0;
        lowlong = 0;
        goto writedata;
    }
    /* infinity */
    if(x > DBL_MAX) {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 0;
        goto writedata;
    }
    /* -infinity */
    if(x < -DBL_MAX) {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        hilong |= (1 << 31);
        lowlong = 0;
        goto writedata;
    }
    /* NaN - dodgy because many compilers optimise out this test
     * isnan() is C99, POSIX.1 only, use it if you will.
     */
    if(x != x) {
        hilong = 1024 + ((1 << (expbits - 1)) - 1);
        hilong <<= (31 - expbits);
        lowlong = 1234;
        goto writedata;
    }

    /* get the sign */
    if(x < 0) {
        sign = 1;
        fnorm = -x;
    } else {
        sign = 0;
        fnorm = x;
    }

    /* get the normalized form of f and track the exponent */
    shift = 0;
    while(fnorm >= 2.0) {
        fnorm /= 2.0;
        shift++;
    }
    while(fnorm < 1.0) {
        fnorm *= 2.0;
        shift--;
    }

    /* check for denormalized numbers */
    if(shift < -1022) {
        while(shift < -1022) {
            fnorm /= 2.0;
            shift++;
        }
        shift = -1023;
    } else {
        /* take the significant bit off mantissa */
        fnorm = fnorm - 1.0;
    }
    /* calculate the integer form of the significand */
    /* hold it in a  double for now */

    significand = fnorm * ((1LL << significandbits) + 0.5f);

    /* get the biased exponent */
    exp = shift + ((1 << (expbits - 1)) - 1);   /* shift + bias */

    /* put the data into two longs */
    hibits = (long)(significand / 4294967296);  /* 0x100000000 */
    hilong = (sign << 31) | (exp << (31 - expbits)) | hibits;
    lowlong = (unsigned long)(significand - hibits * 4294967296);

 writedata:
    /* write the bytes out to the stream */
    if(bigendian) {
        fputc((hilong >> 24) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc(hilong & 0xFF, fp);

        fputc((lowlong >> 24) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc(lowlong & 0xFF, fp);
    } else {
        fputc(lowlong & 0xFF, fp);
        fputc((lowlong >> 8) & 0xFF, fp);
        fputc((lowlong >> 16) & 0xFF, fp);
        fputc((lowlong >> 24) & 0xFF, fp);

        fputc(hilong & 0xFF, fp);
        fputc((hilong >> 8) & 0xFF, fp);
        fputc((hilong >> 16) & 0xFF, fp);
        fputc((hilong >> 24) & 0xFF, fp);
    }
    return ferror(fp);
}

In machines using IEEE-754 (ie. the common case), all you'll need to do to get the number is an fread(). Otherwise, decode the bytes yourself (sign * 2^(exponent-127) * 1.mantissa).

Note: when serializing in systems where the native double is more precise than the IEEE double, you might encounter off-by-one errors in the low bit.

Hope this helps.

Michael Foukarakis
  • 39,737
  • 6
  • 87
  • 123
4

For the narrow question about float, note that you probably end up assuming that both ends of the wire are using the same representation for floating point. This might be safe today given the pervasive use of IEEE-754, but note that some current DSPs (I believe blackfins) use a different representation. In the olden days, there were at least as many representations for floating point as there were manufactures of hardware and libraries so this was a bigger issue.

Even with the same representation, it might not be stored with the same byte order. That will necessitate deciding on a byte order on the wire, and tweaked code at each end. Either the type-punned pointer cast or the union will work in practice. Both are invoking Implementation Defined behavior, but as long as you check and test that is not a big deal.

That said, text is often your friend for transferring floating point between platforms. The trick is to not use too many more characters that are really needed to convert it back.

All in all, I'd recommend giving some serious consideration to using a library such as XDR that is robust, been around for a while, and has been rubbed up against all of the sharp corner and edge cases.

If you insist on rolling your own, take care about subtle issues like whether int is 16 bits, 32 bits, or even 64 bits in addition to representation of float and double.

RBerteig
  • 41,948
  • 7
  • 88
  • 128
2

You can always use unions to serialize:

void serialize_double (unsigned char* buffer, double x) {
    int i;
    union {
        double         d;
        unsigned char  bytes[sizeof(double)];
    } u;

    u.d = x;
    for (i=0; i<sizeof(double); ++i)
        buffer[i] = u.bytes[i];
}

This isn't really any more robust than simply casting the address of the double to a char*, but at least by using sizeof() throughout the code you are avoiding problems when a data type takes up more/less bytes than you thought it did (this doesn't help if you are moving data between platforms that use different sizes for double).

For floats, simply replace all instances of double with float. You may be able to build a crafty macro to auto-generate a series of these functions, one for each data type you are interested in.

bta
  • 43,959
  • 6
  • 69
  • 99
1

Following your update, you mention the data is to be transmitted using UDP and ask for best practices. I would highly recommend sending the data as text, perhaps even with some markup added (XML). Debugging endian-related errors across a transmission-line is a waste of everybody's time

Just my 2 cents on the "best practices" part of your question

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
S.C. Madsen
  • 5,100
  • 5
  • 32
  • 50
  • 4
    Although sending plain-text would be nice, one of the requirements is to use as little bandwidth as possible. – Trevor Aug 05 '10 at 19:49
  • 1
    @Trevor Sending text does not necessarily mean much extra bandwidth. For example, sending the integer 1 takes 4 bytes (on your platform) when sent as an int and 2 (assuming a separator) when sent as text. so this tends to even out. and text is far, far simpler to handle and debug. –  Aug 05 '10 at 19:52
  • Okidoki, then use the example-code I showed in my earlier answer, and let me know if it works for ya – S.C. Madsen Aug 05 '10 at 19:52
  • 2
    In the end, using plain text with a delimeter was the easiest way to go. It's only a few extra bytes per message compared with serializing floats and doubles into the message. Thanks. – Trevor Aug 06 '10 at 14:29
1

To start, you should never assume that short, int etc have the same width on both sides. It would be much better to use the uint32_t etc (unsigned) types that have known width on both sides.

Then to be sure that you don't have problems with endianess there are the macros/functions ntoh htos etc that are usually much more efficient than anything you can do by your own. (on intel hardware they are e.g just one assembler instruction.) So you don't have to write conversion functions, basically they are already there, just cast your buffer pointer to a pointer of the correct integer type.

For float you may probably assume that they are 32 bit and have the same representation on both sides. So I think a good strategy would be to use a pointer cast to uint32_t* and then the same strategy as above.

If you think you might have different representations of float you would have to split into mantissa and exponent. Probably you could use frexpf for that.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
0

you can use https://github.com/souzomain/Packer This library serializes data and returns a buffer, you can study using the code.

example:

PPACKER protocol = packer_init();
packer_add_data(protocol, yourstructure, sizeof(yourstructure));
send(fd, protocol->buffer, protocol->offset, 0); //use the buffer and the size
packer_free(protocol);

you can get the returns using

recv(fd, buffer, size, 0);
size_t offset = 0;
yourstructure data = (yourstructure *)packer_get_data(buffer, sizeof(yourstructure), &offset);
souzomain
  • 11
  • 1