7

I'd like to implement the binary serialization on my own, without using Boost or any other third-party library.

In C++ the simpliest way to achieve it is to use ofstream and then send a binary file over network. But is there any other stream class which I can use as a temporary buffer to avoid writing file to disk?

Also, how can I achieve that in pure C?

Andrey Ermakov
  • 3,298
  • 1
  • 25
  • 46
Secret
  • 2,627
  • 7
  • 32
  • 46
  • 2
    Are you looking for [`std::stringstream`](http://en.cppreference.com/w/cpp/io/basic_stringstream)? – Florian Sowade Jun 12 '12 at 19:15
  • 1
    `std::ostream` can be used as a buffer although binary serialization generally implies saving the object and state to disk. – AJG85 Jun 12 '12 at 19:18
  • You state `the aim is "to understand the process fully and not just calling/using some ready stuff"` and you also know that boost does what you're looking for... why don't you simply review how boost does it so you'll understand how it works? – mah Jun 12 '12 at 19:22
  • Google for 'write' and 'read'. – David Rodríguez - dribeas Jun 12 '12 at 19:36
  • 1
    @mah cause boost design wasn't developed for educational aims, but for using it, the block of serializtion isn't clear enough. – Secret Jun 12 '12 at 20:08

4 Answers4

18

Persistence is hard issue. It is not trivial to even serialize an object to disk. Say that, for example, you have a structure like this one in C:

struct Person {
    char name[100];
    int year;
};

This is a sef-contained structure, probably the simplest way in which serialization can really be applied. However, you'll have to face the following problems:

  1. The compiler's padding system. The way to complete a structure in memory so it occupies a whole number of words in memory is not standard.

  2. The way the operating system and the machine itself represents data in binary form. Obviously, this representation changes from one machine to another one.

The conclusion is that a file created even by the same program in the same operating system may not be compatible with the same program in the same operating system, because maybe both programs were compiled with different C compilers.

Now let's see an object in C++:

class Person {
public:
    // more things...

private:
    string name;
    Date * birth;
    Firm * firm;
};

Now the very same thing has become really complex. The object is no more self-contained, you should follow the pointers in order to decide how to deal with each object (this is called 3. pointer swizzling and transitive persistence). And you still have 1) and 2) problems.

So let's say that you focus on self-contained objects, and still need a solution for points 1 & 2. The only way to go is to decide a representation in either a) text format or b) bytecode format. Bytecode format can be understood by any program in any operating system, compiled with any C compiler, because the information is read and written byte by byte. This is the way that Java or C# serialize their objects. Text format as a representation is as valid as bytecode, though slower. Its main advantage is that it can be understood by a human being as well as the computer (a structured text format could be XML).

So, in order to serialize your self-contained objects, however the output format chosen, you need to have basic functions (or classes in C++) that are able to read ints, chars, strings, and so on. When you have the write/read pairs for each one, you'll have to provide the programmer with the possibility to create her own write/read pairs for her objects, using you read/write pairs for elemental data.

We are talking here about a complete framework, something like what Python offers with its pickle module.

Finally, the fact of being able to cache your serialization instead of saving it to disk, is the least of your problems. You could use the ostringstream class if you are using a text-based format, or a memory block if you are using bytecode.

As you can see, it is not a simple job. Hope this helps.

Baltasarq
  • 12,014
  • 3
  • 38
  • 57
  • 1
    thanks a lot for such good and 5-star answer, I asked such question because, I need to serialize strcuture, which is used as SSL-message for server. I'm trying to work manually with SSL without OpenSSL and else and have become intersted in self-binary serialization too! That's why I asked such question. Thanks again! – Secret Jun 12 '12 at 20:38
3

I have been using JSON for serializing data. It is simple, which is a very good thing. It is easy to get JSON right, and easy to tell if anything goes wrong with it.

It is not as space-efficient as other formats, but for many purposes it is good enough. And there is free library code you can get from the JSON web site.

http://json.org/

steveha
  • 74,789
  • 21
  • 92
  • 117
2

In pure C you can use the Binn format.

Sample code:

  binn *obj;

  // create a new object
  obj = binn_object();

  // add values to it
  binn_object_set_int32(obj, "id", 123);
  binn_object_set_str(obj, "name", "John");
  binn_object_set_double(obj, "total", 2.55);

  // send over the network or save to a file...
  send(sock, binn_ptr(obj), binn_size(obj));

  // release the buffer
  binn_free(obj);

disclaimer: I am the creator

Bernardo Ramos
  • 4,048
  • 30
  • 28
-7

In some cases, when dealing with simple types, you can do:

object o; 
socket.write(&o, sizeof(o)); 
Secret
  • 2,627
  • 7
  • 32
  • 46
  • 8
    Since people have downvoted this, but didn't comment, I'll explain what's wrong: Due to alignment issues with structs, most compilers pad the member variables of structs to align them to certain multiples of bytes. See: http://stackoverflow.com/questions/119123/why-isnt-sizeof-for-a-struct-equal-to-the-sum-of-sizeof-of-each-member As such, this code may not work properly if writen with code compiled by one compiler for one platform, and read by code compiled by another compiler, or the same compiler for another platform. The code is cross-platform, but the data it produces isn't. – Jamin Grey May 01 '13 at 19:24
  • This results in improperly aligned data when using compilers that differ in their methods even a slight bit. Furthermore, the data won't be portable across different architectures, compilers, or even operating systems in some cases. -1 – Michael J. Gray Nov 20 '14 at 00:43
  • As much as I like to discourage the use of pragmas, I believe "pragma pack" largely solves this issue. I believe it is supported on GCC, clang, ICC, and MSVC, so it may be reasonable. The reason I bring it up, is because it's a good thing to read up on and understand what it is and why you need it. Additionally, because this answer doesn't show us the structure and whether or not it uses pragma pack, we can't be certain whether this works or not. It is possible that this code works. – Christopher Mauer Sep 08 '21 at 08:02