0

I'm trying to follow an article written at https://accu.org/index.php/journals/2317 I found it kind of interesting now that I'm trying to dive in serializaton / file saving / loading for some administrative tools I'm working on. I tagged it c++ because Ill wrap it on a class after I make it work (hopefully with your help).

Now, this is for learning purposes, as I don't intend to use boost or any other serialization library at the moment. I want to know or try to find out, out these things work.

Currently, I had to do some slights modifications in order to work, because assertions where always false and had a problem trying to write outside allocated memory with memcpy.

Right now, it compiles, but when I Serialize and Deserialize the structs, the data is not the same.

Please review and help me out, or point me in the right direction. After the code, I'll try to explain how I understand it.

#include <iostream>
#include <cassert>

struct Y
{
    int yy;
};

struct X
{
    int xx;
    struct Y* y = nullptr;
    int z;
};

// Changed OutMemStram and InMemStream for IOMemStream, same data
struct IOMemStream
{
    // changed from uint8_t* to char*
    char* pp;
    char* ppEnd;
};

// Output
inline void WriteToStream( IOMemStream* dst, void* p, size_t sz )
{
    dst->pp = (char*)p; // original code doesn't contain this line
    dst->ppEnd = (char*)p + sz; // original code doesn't contain this line

    assert( (dst->pp + sz) <= dst->ppEnd );
    memcpy( dst->pp, p, sz );
    dst->pp += sz;
}

void SerializeX( IOMemStream* dst, X* x )
{
    WriteToStream( dst, x, sizeof( X ) );
    WriteToStream( dst, x->y, sizeof( Y ) );
}

// Input
inline void ReadFromStream( IOMemStream* src, void* p, size_t sz ) 
{
    //assert( (src->pp + sz) <= src->ppEnd );

    memcpy( p, src->pp, sz );
    src->pp += sz;
}

void DeserializeX( IOMemStream* src, X* x ) 
{
    ReadFromStream( src, x, sizeof( X ) );
    // x->y contains garbage at this point(!)
    // ok, not exactly garbage - but a pointer
     // which is utterly invalid in our current space
    x->y = new Y;
    assert( x->y );
    ReadFromStream( src, x->y, sizeof( Y ) );
}


// Usage sample
int main()
{
    // Assume struct x was previously filled by other function
    X x;
    x.xx = 1000;
    x.z = 2000;
    x.y = new Y;
    x.y->yy = 3000;

    // IO buffer
    IOMemStream ioms;

    // Test for output
    SerializeX( &ioms, &x );

    // Test for input
    X x1;
    DeserializeX( &ioms, &x1 );

    // x1.xx should be 1000 and x1.< should be 2000
    std::cout << x1.xx << ", " << x1.z << std::endl;

    delete x.y;
    delete x1.y;
    //delete ioms.pp; // gets exception

    std::cin.get();
    return 0;
}

This is how I understand (or didn't understand).

  1. Struct X, contains 2 ints and 1 struct pointer to a Y struct, which, assuming int size = 4, then X's size will be 12 bytes. Y it's 4 bytes.
  2. IOMemStream contains pointers for both struct X and struct Y.
  3. IOMemStream->pp and IOMemStream->ppEnd should have 12 bytes and 4 bytes from the struct X and X->Y.
  4. Function WriteToStream packs the bytes from the structs to the pp and ppEnd pointers, after X is assigned, the pointer is incremented by the size to get ready for Y struct. (in the original article, assertion is made without assigning variables pp and ppEnd from
  5. IOMemStream) Function SerializeX uses WriteToStrea for both X and Y structs.

Almost the same for the Deserialization but in inverse order and ReadFromStream will allocate memory for struct Y.

I got lost from here, as the values are not the same from serialization and deserialization. Also, I hope I understood it properly :D

Thank you in advance!

  • Did you run this code under valgrind, or compile with address sanitizer? – EOF May 25 '20 at 19:58
  • Hi!, no, sorry. Need to inverstigate what valgrind or address sanitizer is. I'm really a newbie, I finished some tuts online and I'm trying to code my own tools and try to understand other ppl's code as excercese. – Jonathan Michel May 25 '20 at 20:02
  • 1
    Is there a particular reason the line `//assert( (src->pp + sz) <= src->ppEnd );` is commented out? Did this line perhaps cause some kind of problem when running the code? – EOF May 25 '20 at 20:07
  • Yes, I commented it because assertion was false always, even tho, it was using the same variable (IOMemStream) it was passed on WriteToStream, in which assertion was true. (ioms.pp < ioms.pEnd) – Jonathan Michel May 25 '20 at 20:10
  • 1
    Maybe you should consider why the assertion was false? Perhaps, and I know this is a lot to ask, you should ask yourself if that assertion maybe could *possibly* be there for a reason? Maybe, and don't take this the wrong way, you might have overlooked something relating to that assertion? – EOF May 25 '20 at 20:39
  • Well, as far as I understand, pp starting address should be less than pEnd starting address, so it can have enough room to store struct X and pEnd Struct Y. What I find strrange, is that I did the same assertion moments before inside WriteToStream and came out true. – Jonathan Michel May 25 '20 at 20:58
  • Fascinating, isn't it? But you have two choices: 1) assume that the failing assertion is a compiler bug. 2) accept that *somewhere* between the two assertions you are violating the underlying assumptions of your program. If you chose the right alternative, you may come to the conclusion that removing the second assertion was not the best idea, and may have been counterproductive for your attempt at fixing your program. – EOF May 25 '20 at 21:07

2 Answers2

1

Disclaimer: This is just a bunch of comments which are too long to be written as comments, instead of a proper answer.


Things are not so simple and you have some wrong assumptions. Particularly, even if an int takes 4 bytes and a pointer takes 8 bytes, your X struct can have more then 16 bytes due to padding.

Try adding the code below to your main and see what you get

    X x;
    std::cout << sizeof(x.xx) << std::endl;
    std::cout << sizeof(x.y) << std::endl;
    std::cout << sizeof(x.z) << std::endl;
    std::cout << sizeof(x) << std::endl;

In my system I get 4 bytes for each integer and 8 bytes for the pointer, but for the struct I get 24 bytes. The reason is due to padding (see this question for more). Notice that the order you declare the members of the struct has an effect on the padding. If instead you declared

struct X {
    struct Y* y = nullptr;
    int xx;
    int z;
};

or

```c++
struct X {
    int xx;
    int z;
    struct Y* y = nullptr;
};

then the struct would require only 16 bytes (at least in my system).


The other problem is serializing pointers. Obviously, storing the pointer value is of little use. You will need to think about how to address the struct Y. You have to create an object of type Y during deserialization and then put its address in the deserialized struct X, but the details can change depending on the rest of the program and here we only have example code.


At last, careful with memcpy, since it will just save the bytes (probably including padding) and it does not care about endianness.

darcamo
  • 3,294
  • 1
  • 16
  • 27
  • Thats the strange thing. I get 4 bytes for each member and 12 bytes for the whole struct. Also, looked at the debugger with the inspection tool and sizeof(x) and shows 12 bytes as well. I should not worry about endiandness as I will use this tool in my own machine. As for the rest of the program, this is the whole program, I wanted to see if this code worked to modify it and suit my needs with my own clases and data. Probably I'm wrong but, I want to know how this works and using external libraries like (and not limited to) boost, feels like cheating :( – Jonathan Michel May 25 '20 at 21:05
  • Are you using a 32 bits system? That would explain why the pointer only requires 4 bytes. With only 4 bytes a pointer can address up to 4Gb of memory. In 64bits systems pointers usually occupy 8 bytes. Also, since the pointer and the ints use the same amount of memory in your case, then there is no need for padding and the struct would indeed only require 12 bytes. – darcamo May 25 '20 at 21:17
  • I'm compiling to x86, (the default selected) no big reason at all. This looks like ISOCPP's serialization example (inheritance with pointer), which I couldnt figure it out either (yet). But looks interesting, and it makes somewhat the code look "sexy" :D – Jonathan Michel May 25 '20 at 21:31
0

I read the article a little bit more, and seems like the data they're receiving in the stream object, comes from the machine, that's why the pp and ppend, probably having the "before and after" state saved there. Plus, the functions seems to be made only for that purpose.

Too bad as I thought it was kind of good code to learn from, probably I got frustrated over it and miss understood the concept.

Sorry to waste your time guys!, but I won't give up :)

Any books or sites to learn more about serialization that you know of? I don't want to use any external libraries at the moment because I want to learn more and practice doing my own tools. Using external libraries feels to me like cheating for some reason and makes me feel I'm not going through the right path.

thank you, greetings!