41

I'm writing some code to serialize some data to send it over the network. Currently, I use this primitive procedure:

  1. create a void* buffer
  2. apply any byte ordering operations such as the hton family on the data I want to send over the network
  3. use memcpy to copy the memory into the buffer
  4. send the memory over the network

The problem is that with various data structures (which often contain void* data so you don't know whether you need to care about byte ordering) the code becomes really bloated with serialization code that's very specific to each data structure and can't be reused at all.

What are some good serialization techniques for C that make this easier / less ugly?

-

Note: I'm bound to a specific protocol so I cannot freely choose how to serialize my data.

ryyst
  • 9,563
  • 18
  • 70
  • 97
  • If you didn't already have a fixed protocol [XDR](http://en.wikipedia.org/wiki/External_Data_Representation) is a quite widely used choice. It's hard to know what to say if you can't change the representation though. – Flexo May 14 '11 at 14:49

6 Answers6

44

For each data structure, have a serialize_X function (where X is the struct name) which takes a pointer to an X and a pointer to an opaque buffer structure and calls the appropriate serializing functions. You should supply some primitives such as serialize_int which write to the buffer and update the output index. The primitives will have to call something like reserve_space(N) where N is the number of bytes that are required before writing any data. reserve_space() will realloc the void* buffer to make it at least as big as it's current size plus N bytes. To make this possible, the buffer structure will need to contain a pointer to the actual data, the index to write the next byte to (output index) and the size that is allocated for the data. With this system, all of your serialize_X functions should be pretty straightforward, for example:

struct X {
    int n, m;
    char *string;
}

void serialize_X(struct X *x, struct Buffer *output) {
    serialize_int(x->n, output);
    serialize_int(x->m, output);
    serialize_string(x->string, output);
}

And the framework code will be something like:

#define INITIAL_SIZE 32

struct Buffer {
    void *data;
    size_t next;
    size_t size;
}

struct Buffer *new_buffer() {
    struct Buffer *b = malloc(sizeof(Buffer));

    b->data = malloc(INITIAL_SIZE);
    b->size = INITIAL_SIZE;
    b->next = 0;
    
    return b;
}

void reserve_space(Buffer *b, size_t bytes) {
    if((b->next + bytes) > b->size) {
        /* double size to enforce O(lg N) reallocs */
        b->data = realloc(b->data, b->size * 2);
        b->size *= 2;
    }
}

From this, it should be pretty simple to implement all of the serialize_() functions you need.

EDIT: For example:

void serialize_int(int x, Buffer *b) {
    /* assume int == long; how can this be done better? */
    x = htonl(x);

    reserve_space(b, sizeof(int));

    memcpy(((char *)b->data) + b->next, &x, sizeof(int));
    b->next += sizeof(int);
}

EDIT: Also note that my code has some potential bugs. There is no provision for error handling and no function to free the Buffer after you're done so you'll have to do this yourself. I was just giving a demonstration of the basic architecture that I would use.

Community
  • 1
  • 1
jstanley
  • 794
  • 5
  • 6
  • love this little script i had to study first year of accademy XD – dynamic May 14 '11 at 15:04
  • 3
    It's also a good idea to prepend the file with some magic bytes and a version number, to quickly discriminate incoming data you definitely cannot process. Also giving each structure a version is possible. And last but not least: Be paranoid when parsing data "from the outside", being not carefull makes your program attackable, or unstable at least. – datenwolf May 14 '11 at 15:27
  • Using `b->data + b->next` is not portable because `b->next` has type `void *`. – Dietrich Epp Jun 24 '11 at 23:40
  • 1
    Of course, it needs to be said that this approach is entirely platform-specific and thus can't be used across machines or in networking scenarios. – Noldorin Feb 25 '14 at 22:09
  • @Noldorin How so? endianness issues? – vexe Aug 27 '15 at 19:30
  • 1
    @vexe: Yes, both byte and bit endianness for a start. Also simply the representation format used, especially for negative numbers (2's or 1's complement?). – Noldorin Aug 28 '15 at 19:50
  • @Noldorin which line(s) exactly? in `memcpy`? what would you suggest as an improvement? – vexe Aug 28 '15 at 20:04
  • To be honest, the easiest lossless serialisation method would be to stringify it. Not the most efficient, but otherwise you have to explicitly know about the memory representation yourself. – Noldorin Aug 28 '15 at 23:21
  • If you use this approach with structures, you might have problems with data portability even between programs built by the same compiler on the same platform. The compiler can end up deciding to use different memory layouts of structure. Nasty heisenbergs ensue. And of course it's a huge minefield for those who come after you. The only safe approach I know is to avoid memcpy on structures, and manually serialize all fields. It's also possible that the GCC packed attribute might ensure that memcpy would work for structures, but I haven't tried that. – Britton Kerin Apr 06 '18 at 21:37
5

I suggest using a library.

As I was not happy with the existing ones, I created the Binn library to make our lives easier.

Here is an example of using it:

  binn *obj;

  // create a new object
  obj = binn_object();

  // add values to it
  binn_object_set_int32(obj, "id", 123);
  binn_object_set_str(obj, "name", "Samsung Galaxy Charger");
  binn_object_set_double(obj, "price", 12.50);
  binn_object_set_blob(obj, "picture", picptr, piclen);

  // send over the network
  send(sock, binn_ptr(obj), binn_size(obj));

  // release the buffer
  binn_free(obj);
Bernardo Ramos
  • 4,048
  • 30
  • 28
  • I tried the binn library and it's pretty good. But I noticed there is a lot of serializing libraries which work in some cases but work very badly in some other. So depending on the content I would recommend trying few. – Anton Krug Feb 16 '18 at 10:24
5

I would say definitely don't try to implement serialization yourself. It's been done a zillion times and you should use an existing solution. e.g. protobufs: https://github.com/protobuf-c/protobuf-c

It also has the advantage of being compatible with many other programming languages.

Hasse Björk
  • 1,431
  • 13
  • 19
Assaf Lavie
  • 73,079
  • 34
  • 148
  • 203
  • 1
    +1 - Using 3rd party external serialisaiton brings other benefits like tools that can inspect streams and view the objects they describe directly. – Flexo May 14 '11 at 15:27
1

It would help if we knew what the protocol constraints are, but in general your options are really pretty limited. If the data are such that you can make a union of a byte array sizeof(struct) for each struct it might simplify things, but from your description it sounds like you have a more essential problem: if you're transferring pointers (you mention void * data) then those points are very unlikely to be valid on the receiving machine. Why would the data happen to appear at the same place in memory?

Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
  • 3
    I meant `void *` as a pointer to any data, such as a `uint32_t` or a `const char *` or a custom structure. In the end, I want to send the data, not the pointer of course :) – ryyst May 14 '11 at 14:59
0

For "C" programs, when there are not lot of good options for "automatic" serialization. Before "giving up", suggesting to review the SUNRPC package (rpcgen and friends). It has:

  • Custom format, the "XDR" language (basically, subset of "C") to describe data structure.
  • RPC generation - making it possible to automatically generate the client and server side of the serialization.
  • Runtime library, shipped with (almost) all unix environment.

The protocol and code have internet standard.

Community
  • 1
  • 1
dash-o
  • 13,723
  • 1
  • 10
  • 37
0

This library can help you. https://github.com/souzomain/Packer

It's easy to use, and the code is clean to study.

use example:

PPACKER protocol = packer_init();
packer_add_data(protocol, yourstructure, sizeof(yourstructure));
send(fd, protocol->buffer, protocol->offset, 0);
packer_free(protocol);
souzomain
  • 11
  • 1