16

I'm not a c++ expert but I've serialized things a couple of times in the past. Unfortunately this time I'm trying to serialize a class which contains an std::string, which I understand is pretty much like serializing a pointer.

I can write out the class to a file and read it back in again. All int fields are fine, but the std::string field gives an "address out of bounds" error, presumably because it points to data which is no longer there.

Is there a standard workaround for this? I don't want to go back to char arrays, but at least I know they work in this situation. I can provide code if necessary, but I'm hoping I've explained my problem well.

I'm serializing by casting the class to a char* and writing it to a file with std::fstream. Reading of course is just the reverse.

codeling
  • 11,056
  • 4
  • 42
  • 71
iwasinnamuknow
  • 303
  • 1
  • 2
  • 10
  • IMO, you'll have to dump string data manually. Prepare a plain structure which has a char buffer and the string length and serialize it instead of original object. – RocketR Aug 12 '11 at 21:02
  • It ssems to be the only real issue is how you delimit the string, but you would face that issue with a char array as well. I'm not getting where you are having trouble, seems very easy to serialise a string to me. Probably you better post some code. – john Aug 12 '11 at 21:05
  • The java has standard serialization (in standard library). C++ have no such functionality nor in the language nor in the STL library. There is an external libs to do such, e.g. boost can do this. Other variant is using of google's protocol buffers. – osgx Aug 12 '11 at 21:06
  • 1
    Nitpicking: you're serializing an _object_. – xtofl Aug 12 '11 at 21:06
  • An intermediate structure does make sense to me. It does beg the question why I'm bothering with these strings in the first place, it seems to be a false economy in the long run. – iwasinnamuknow Aug 12 '11 at 21:08
  • I will edit the original post with some code shortly. A little further explanation though. When the object is written, the ints are written as numbers, but the string is written as a pointer address instead of characters. Hence unless memory remains unchanged, the string is lost on reading. – iwasinnamuknow Aug 12 '11 at 21:12
  • I'm guessing you are doing this `out << &str;`, that's the wrong way to do it. – john Aug 12 '11 at 21:17
  • If you're on Linux, another good method is to construct an array of IOVs and give it to `writev` function(http://linux.die.net/man/2/writev) to write everything in one shot. – RocketR Aug 12 '11 at 21:18
  • @iwas: "False economy"? You mean apart from the automatic memory management, integration with streams, etc., etc.? – Oliver Charlesworth Aug 12 '11 at 21:21

6 Answers6

16

I'm serializing by casting the class to a char* and writing it to a file with fstream. Reading of course is just the reverse.

Unfortunately, this only works as long as there are no pointers involved. You might want to give your classes void MyClass::serialize(std::ostream) and void MyClass::deserialize(std::ifstream), and call those. For this case, you'd want

std::ostream& MyClass::serialize(std::ostream &out) const {
    out << height;
    out << ',' //number seperator
    out << width;
    out << ',' //number seperator
    out << name.size(); //serialize size of string
    out << ',' //number seperator
    out << name; //serialize characters of string
    return out;
}
std::istream& MyClass::deserialize(std::istream &in) {
    if (in) {
        int len=0;
        char comma;
        in >> height;
        in >> comma; //read in the seperator
        in >> width;
        in >> comma; //read in the seperator
        in >> len;  //deserialize size of string
        in >> comma; //read in the seperator
        if (in && len) {
            std::vector<char> tmp(len);
            in.read(tmp.data() , len); //deserialize characters of string
            name.assign(tmp.data(), len);
        }
    }
    return in;
}

You may also want to overload the stream operators for easier use.

std::ostream &operator<<(std::ostream& out, const MyClass &obj)
{obj.serialize(out); return out;}
std::istream &operator>>(std::istream& in, MyClass &obj)
{obj.deserialize(in); return in;}
Mooing Duck
  • 64,318
  • 19
  • 100
  • 158
  • Looks interesting and not too disruptive to the existing code/workflow. I'll have a play. thanks – iwasinnamuknow Aug 12 '11 at 21:24
  • 1
    (1) Your streams need to be passed by reference, istream and ostream copy constructors are disabled. (2) width and height and the size of the string will be concatenated together on output, so reading them back in will result in a single number. – Benjamin Lindley Aug 12 '11 at 21:41
  • `in.read(&name[0], len);` that is surely wrong. You cannot treat a string like a vector. And even as a vector it would fail if len == 0. – john Aug 12 '11 at 21:43
  • @john: agreed. An intermediate `char *nameValue = new char[len + 1];` seems to be required. – Rudy Velthuis Aug 12 '11 at 21:56
  • @Benjamin Lindley: whoops, I forgot to make them by reference. My bad. – Mooing Duck Aug 12 '11 at 22:08
  • @John and Rudy: nope. std::string (non-const) operator[] returns a char&, so the address of that is a char* into the string. Since I've resized to the exact length, this is all defined behavior, and works. (Although it would have failed if len was zero) – Mooing Duck Aug 12 '11 at 22:09
  • Writing using `c_str()` is going to create problems if the strings has embedded NUL chars (\0) because less characters will be written. You should either writing the correct number of characters looping over the string or write `strlen(c_str())` instead of `size` if you want to drop characters after the first NUL. Writing `size` and `c_str` will make you reading corrupted data if a NUL is stored in a string. – 6502 Aug 12 '11 at 22:21
  • @John and Rudy: I stand corrected, std::string is not guaranteed to be contiguous. Temporary is required. – Mooing Duck Aug 12 '11 at 22:26
  • Wish I could reset the votes and uncheck accepted answer, this code has changed quite a bit since it was accepted :( – Mooing Duck Aug 12 '11 at 22:33
  • `auto_ptr` should not be used for arrays. It it only calls `delete`, not `delete[]`. Try a `vector`, you can read directly into it from a file using `&v[0]`. – Benjamin Lindley Aug 13 '11 at 15:57
10

Simply writing the binary contents of an object into a file is not only unportable but, as you've recognized, doesn't work for pointer data. You basically have two options: either you write a real serialization library, which handles std::strings properly by e.g. using c_str() to output the actual string to the file, or you use the excellent boost serialization library. If at all possible, I'd recommend the latter, you can then serialize with a simple code like this:

#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/serialization/string.hpp>

class A {
    private:
        std::string s;
    public:
        template<class Archive>
        void serialize(Archive& ar, const unsigned int version)
        {
            ar & s;
        }
};

Here, the function serialize works for serializing and deserializing the data, depending on how you call it. See the documentation for more information.

JoeG
  • 12,994
  • 1
  • 38
  • 63
Antti
  • 11,944
  • 2
  • 24
  • 29
3

The easiest serialization method for strings or other blobs with variable size is to serialize first the size as you serialize integers, then just copy the content to the output stream.

When reading you first read the size, then allocate the string and then fill it by reading the correct number of bytes from the stream.

An alternative is to use a delimiter and escaping, but requires more code and is slower both on serialization and deserialization (however the result can be kept human readable).

6502
  • 112,025
  • 15
  • 165
  • 265
1

You'll have to use a more complicated method of serialization than casting a class to a char* and writing it to a file if your class contains any exogenous data (string does). And you're correct about why you're getting a segmentation fault.

I would make a member function that would take an fstream and read in the data from it as well as an inverse function which would take an fstream and write it's contents to it to be restored later, like this:

class MyClass {
pubic:
    MyClass() : str() { }

    void serialize(ostream& out) {
        out << str;
    }

    void restore(istream& in) {
        in >> str;
    }

    string& data() const { return str; }

private:
    string str;
};

MyClass c;
c.serialize(output);

// later
c.restore(input);

You can also define operator<< and operator>> to work with istream and ostream to serialize and restore your class as well if you want that syntactic sugar.

Seth Carnegie
  • 73,875
  • 22
  • 181
  • 249
  • Would the write/read actions act differently if used as member functions? I'm not really understanding how that would write the actual characters instead of the pointer address. – iwasinnamuknow Aug 12 '11 at 21:05
  • @iwasinnamuknow: No write and read actions don't act differently when used as member function, what gives you that idea? – john Aug 12 '11 at 21:09
  • @iwasinnamuknow It's using `operator<<` and `>>` of `(i|o)stream` on a `string` which is defined to write the contents of the string to file. You'd obviously have more data members than one string, so you'd just write them all to the output file and then read them in from the input file in the same order. – Seth Carnegie Aug 12 '11 at 21:09
  • 1
    @john it was just a quick example. – Seth Carnegie Aug 12 '11 at 21:10
  • This would stop reading at the first space in the string. – Costantino Grana Apr 06 '22 at 16:33
0

Why not just something along the lines of:

std::ofstream ofs;
...

ofs << my_str;

and then:

std::ifstream ifs;
...

ifs >> my_str; 
Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
  • Wouldn't that assume that the string is separate from anything else? I'm trying to have the entire class and its contents written/read in one go. – iwasinnamuknow Aug 12 '11 at 21:03
  • If that's interspersed with other data, or if the string has spaces in it, the input will not be accurate. – Benjamin Lindley Aug 12 '11 at 21:04
  • Would that work with a string that contains spaces and/or newlines? – 6502 Aug 12 '11 at 21:04
  • @iwas: You cannot simply reinterpret a class as a `char *`. In general, serialization of objects requires (semi-)manually serializing each member variable in turn. I'm not quite sure what sort of solution you're looking for! – Oliver Charlesworth Aug 12 '11 at 21:05
  • @John: Good point, no it wouldn't. But then it wouldn't work with raw `char *`, either. – Oliver Charlesworth Aug 12 '11 at 21:06
  • 1
    @Oli: This is the point surely, the OP is claiming that serialising a std::string is somehow harder than serialising a char array. That's the bit I don't get and until he explains himself I don't think we're going to get very far. – john Aug 12 '11 at 21:11
  • @oli I'm following one of the multitude of guides that simply suggest casting the class and writing it out. This does work except where a pointer is concerned. And the string is acting like a pointer. A static length char array works fine as well in my quick test. I'm trimming up some code now. – iwasinnamuknow Aug 12 '11 at 21:18
  • @iwas: No, there are many more cases where that won't work. It can only possibly work for [PODs](http://en.wikipedia.org/wiki/Plain_old_data_structure) (plain-old data structures). – Oliver Charlesworth Aug 12 '11 at 21:20
  • @Oli I understand, I'm checking out the boost serialization now in addition to the other answers. – iwasinnamuknow Aug 12 '11 at 21:24
0
/*!
 * reads binary data into the string.
 * @status : OK.
*/

class UReadBinaryString
{
    static std::string read(std::istream &is, uint32_t size)
    {
        std::string returnStr;
        if(size > 0)
        {
            CWrapPtr<char> buff(new char[size]);       // custom smart pointer
            is.read(reinterpret_cast<char*>(buff.m_obj), size);
            returnStr.assign(buff.m_obj, size);
        }

        return returnStr;
    }
};

class objHeader
{
public:
    std::string m_ID;

    // serialize
    std::ostream &operator << (std::ostream &os)
    {
        uint32_t size = (m_ID.length());
        os.write(reinterpret_cast<char*>(&size), sizeof(uint32_t));
        os.write(m_ID.c_str(), size);

        return os;
    }
    // de-serialize
    std::istream &operator >> (std::istream &is)
    {
        uint32_t size;
        is.read(reinterpret_cast<char*>(&size), sizeof(uint32_t));
        m_ID = UReadBinaryString::read(is, size);

        return is;
     }
};
legion
  • 199
  • 1
  • 4
  • 13
  • @RocketR. Did i write union. Well fixed. You know its was quick past of code portions from some old my project files.. – legion Aug 12 '11 at 21:11