Having trouble serializing binary data using ifstream and ofstream

Question

I am trying to serialize a Plain Old Datastructure using ifstream and ofstream and I wasn't able to get it to work. I then tried to reduce my problem to an ultra basic serialization of just a char and int and even that didn't work. Clearly I'm missing something at a core fundamental level.

For a basic structure:

struct SerializeTestStruct
{
    char mCharVal;
    unsigned int mIntVal;

    void Serialize(std::ofstream& ofs);
};

With serialize function:

void SerializeTestStruct::Serialize(std::ofstream& ofs)
{
    bool isError = (false == ofs.good());
    if (false == isError)
    {
        ofs.write((char*)&mCharVal, sizeof(mCharVal));
        ofs.write((char*)&mIntVal, sizeof(mIntVal));
    }
}

Why would this fail with the following short program?

//ultra basic serialization test.
    SerializeTestStruct* testStruct = new SerializeTestStruct();
    testStruct->mCharVal = 'y';
    testStruct->mIntVal = 9;

    //write
    std::string testFileName = "test.bin";
    std::ofstream fileOut(testFileName.data());
    fileOut.open(testFileName.data(), std::ofstream::binary|std::ofstream::out);
    fileOut.clear();
    testStruct->Serialize(fileOut);

    fileOut.flush();
    fileOut.close();

    delete testStruct;

    //read
    char * memblock;
    std::ifstream fileIn (testFileName.data(), std::ifstream::in|std::ifstream::binary);
    if (fileIn.is_open())
    {
        // get length of file:
        fileIn.seekg (0, std::ifstream::end);
        int length = fileIn.tellg();
        fileIn.seekg (0, std::ifstream::beg);

        // allocate memory:
        memblock = new char [length];
        fileIn.read(memblock, length);
        fileIn.close();

        // read data as a block:
        SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();

        delete[] testStruct2;
    }

When I run through the code I notice that memblock has a "y" at the top so maybe it is working and it's just a problem with the placement new at the very end? After that placement new I end up with a SerializeTestStruct with values: 0, 0.

Why don't you use the insertion/extraction operators with the streams? — Andrew Rasmussen, Apr 18 '11 at 19:31
this is alreadya solved problem in C++ if you're allowed to use other libraries. You doing this just to benefit yourself? — wheaties, Apr 18 '11 at 19:37
@arasmussen: The insertion/extraction operators are for formatted text i/o. He wants unformatted binary i/o. — Benjamin Lindley, Apr 18 '11 at 19:41
Boost and other libraries are not an option. Yeah just benefiting myself. — MukMuk, Apr 18 '11 at 19:46
Don't use the `data` member function of `::std::string` here. In fact `data` is almost never what you want to use. Use `c_str` instead. The problem is that `data` does not guarantee that the `const char *` you get back is pointing to a `'\0'` terminated string. — Omnifarious, Apr 18 '11 at 20:01

Omnifarious · Accepted Answer · 2011-04-18T20:46:11.497

Here is how your stuff should read:

#include <fstream>
#include <string>
#include <stdexcept>

struct SerializeTestStruct
{
    char mCharVal;
    unsigned int mIntVal;

    void Serialize(::std::ostream &os);
    static SerializeTestStruct Deserialize(::std::istream &is);
};

void SerializeTestStruct::Serialize(std::ostream &os)
{
    if (os.good())
    {
        os.write((char*)&mCharVal, sizeof(mCharVal));
        os.write((char*)&mIntVal, sizeof(mIntVal));
    }
}

SerializeTestStruct SerializeTestStruct::Deserialize(std::istream &is)
{
        SerializeTestStruct retval;

    if (is.good())
    {
        is.read((char*)&retval.mCharVal, sizeof(retval.mCharVal));
        is.read((char*)&retval.mIntVal, sizeof(retval.mIntVal));
    }
    if (is.fail()) {
        throw ::std::runtime_error("failed to read full struct");
    }
    return retval;
}

int main(int argc, const char *argv[])
{
//ultra basic serialization test.

    // setup
    const ::std::string testFileName = "test.bin";

    // write
    {
        SerializeTestStruct testStruct;
        testStruct.mCharVal = 'y';
        testStruct.mIntVal = 9;

        ::std::ofstream fileOut(testFileName.c_str());
        fileOut.open(testFileName.c_str(),
                     std::ofstream::binary|std::ofstream::out);
        fileOut.clear();
        testStruct.Serialize(fileOut);
    }

    // read
    {
        ::std::ifstream fileIn (testFileName.c_str(),
                                std::ifstream::in|std::ifstream::binary);
        if (fileIn.is_open())
        {
            SerializeTestStruct testStruct =            \
                SerializeTestStruct::Deserialize(fileIn);

            ::std::cout << "testStruct.mCharVal == '" << testStruct.mCharVal
                        << "' && testStruct.mIntVal == " << testStruct.mIntVal
                        << '\n';
        }
    }
    return 0;
}

Style issues:

Don't use new to create things if you can help it. Stack allocated objects are usually what you want and significantly easier to manage than the arbitrary lifetime objects you allocate from the heap. If you do use new, consider using a smart pointer type of some kind to help manage the lifetime for you.
Serialization and deserialization code should be matched up so that they can be examined and altered together. This makes maintenance of such code much easier.
Rely on C++ to clean things up for you with destructors, that's what they're for. This means making basic blocks containing parts of your code if it the scopes of the variables used is relatively confined.
Don't needlessly use flags.

Mistakes...

Don't use the data member function of ::std::string.
Using placement new and that memory block thing is really bad idea because it's ridiculously complex. And if you did use it, then you do not use array delete in the way you did. And lastly, it won't work anyway for a reason explained later.
Do not use ofstream in the type taken by your Serialize function as it is a derived class who's features you don't need. You should always use the most base class in a hierarchy that has the features you need unless you have a very specific reason not to. Serialize works fine with the features of the base ostream class, so use that type instead.
The on-disk layout of your structure and the in memory layout do not match, so your placement new technique is doomed to fail. As a rule, if you have a serialize function, you need a matching deserialize function.

Here is a further explanation of your memory layout issue. The structure, in memory, on an x86_64 based Linux box looks like this:

+------------+-----------+
|Byte number | contents  |
+============+===========+
|          0 |     0x79  |
|            | (aka 'y') |
+------------+-----------+
|          1 |   padding |
+------------+-----------+
|          3 |   padding |
+------------+-----------+
|          4 |   padding |
+------------+-----------+
|          5 |         9 |
+------------+-----------+
|          6 |         0 |
+------------+-----------+
|          7 |         0 |
+------------+-----------+
|          8 |         0 |
+------------+-----------+

The contents of the padding section are undefined, but generally 0. It doesn't matter though because that space is never used and merely exists so that access to the following int lies on an efficient 4-byte boundary.

The size of your structure on disk is 5 bytes, and is completely missing the padding sections. So that means when you read it into memory it won't line up properly with the in memory structure at all and accessing it is likely to cause some kind of horrible problem.

The first rule, if you need a serialize function, you need a deserialize function. Second rule, unless you really know exactly what you are doing, do not dump raw memory into a file. This will work just fine in many cases, but there are important cases in which it won't work. And unless you are aware of what does and doesn't work, and when it does or doesn't work, you will end up code that seems to work OK in certain test situations, but fails miserable when you try to use it in a real system.

My code still does dump memory into a file. And it should work as long as you read the result back on exactly the same architecture and platform with code compiled with the same version of the compiler as when you wrote it. As soon as one of those variables changes, all bets are off.

score 1 · Answer 2 · answered Apr 18 '11 at 19:34

bool isError = (false == ofs.good());
if (false == isError)
{
    ofs.write((char*)&mCharVal, sizeof(mCharVal));
    ofs.write((char*)&mIntVal, sizeof(mIntVal));
}

change to

if ( ofs.good() )
{
    ofs.write((char*)&mCharVal, sizeof(mCharVal));
    ofs.write((char*)&mIntVal, sizeof(mIntVal));
}

I would do:

ostream & operator << ( ostream &os, const SerializeTestStruct &mystruct )
{
  if ( ofs.good() )
  {
    os.write((char*)&mystruct.mCharVal, sizeof(mCharVal));
    os.write((char*)&mystruct.mIntVal, sizeof(mIntVal));
  }
  return os;
}

score 1 · Answer 3 · edited May 23 '17 at 12:23

1

The problem is here:

SerializeTestStruct* testStruct2 = new(memblock) SerializeTestStruct();

This will construct value-initialized object of type SerializeTestStruct in previously allocated memory. It will fill the memblock with zeros, since value-initialization is zero-initialization for POD-types (more info).

Here's fast fix for your code:

SerializeTestStruct* testStruct2 = new SerializeTestStruct;
fileIn.read( (char*)&testStruct2->mCharVal, sizeof(testStruct2->mCharVal) );
fileIn.read( (char*)&testStruct2->mIntVal, sizeof(testStruct2->mIntVal) );
fileIn.close();
// do some with testStruct2
// ...
delete testStruct2;

edited May 23 '17 at 12:23

Community

1
1

answered Apr 18 '11 at 19:47

Kirill V. Lyadvinsky

97,037
24
136
212

That would certainly be why I'm seeing a zero-initialized structure. What do I have to do then to fully recreate my structure as before? – MukMuk Apr 18 '11 at 20:26
You need to allocate and initialize the memory *before* you read the data from file. Look at the sample in my answer. – Kirill V. Lyadvinsky Apr 18 '11 at 20:54

score 0 · Answer 4 · edited Apr 25 '16 at 08:26

0

Am I the only one that finds this totally opaque:

bool isError = (false == ofs.good());
if (false == isError) {
    // stuff
}

why not:

if ( ofs ) {
    // stuff
}

edited Apr 25 '16 at 08:26

jotik

17,044
13
58
123

answered Apr 18 '11 at 19:43

1

I agree! I apologize for that. I believe it's a bit of a holdover from when I was playing around with flags earlier and it should certainly be cleaned up. Fixing it doesn't solve the core issue however. – MukMuk Apr 18 '11 at 19:45

score 0 · Answer 5 · answered Apr 18 '11 at 20:54

In my opinion, you need allow serialization to a buffer and not directly to a stream. Writing to a buffer allows for nested or inherited classes to write to memory, then the whole buffer can be written to the stream. Writing bits and pieces to the stream is not efficient.

Here is something I concocted, before I stopped writing binary data to streams:

struct Serialization_Interface
{
    //!  Returns size occupied on a stream.
    /*! Note:  size on the platform may be different.
     *  This method is used to allocate memory.
     */
    virtual size_t  size_on_stream(void) const = 0;

    //!  Stores the fields of the object to the given pointer.
    /*!  Pointer is incremented by the size on the stream.
     */
    virtual void    store_to_buffer(unsigned char *& p_buffer) const = 0;

    //!  Loads the object's fields from the buffer, advancing the pointer.
    virtual void    load_from_buffer(const unsigned char *& p_buffer) = 0;
};

struct Serialize_Test_Structure
  : Serialization_Interface
{
    char mCharVal;
    int  mIntVal;

    size_t  size_on_stream(void) const
    {
        return sizeof(mCharVal) + sizeof(mIntVal);
    }

    void  store_to_buffer(unsigned char *& p_buffer) const
    {
        *p_buffer++ = mCharVal;
        ((int&)(*p_buffer)) = mIntVal;
        p_buffer += sizeof(mIntVal);
        return;
    }

    void  load_from_buffer(const unsigned char *& p_buffer)
    {
         mCharVal = *p_buffer++;
         mIntVal = (const int&)(*p_buffer);
         p_buffer += sizeof(mIntVal);
         return;
    }
};


int main(void)
{
   struct Serialize_Test_Struct myStruct;
   myStruct.mCharVal = 'G';
   myStruct.mIntVal = 42;

   // Allocate a buffer:
   unsigned char * buffer = new unsigned char[](myStruct.size_on_stream());

   // Create output file.
   std::ofstream outfile("data.bin");

   // Does your design support this concept?
   unsigned char * p_buffer = buffer;
   myStruct.store_to_buffer(p_buffer);
   outfile.write((char *) buffer, myStruct.size_on_stream());

   outfile.close();
   return 0;
}

I stopped writing binary data to streams in favor of textual data because textual data doesn't have to worry about Endianess or which IEEE floating point format is accepted by the receiving platform.

Having trouble serializing binary data using ifstream and ofstream

5 Answers5