2

I am investigating a port from a non-standard to a standard string in an application that uses boost::archive. The non standard string has its (de-)serialization defined in the non-intrusive style as shown in the example below. Serialization and deserialization works as expected, but when the ported application receives an old message, it crashes with a bad allocation. This is caused by the insertion of 5 bytes (all zero) before the size of the string.

What causes the insertion of these 5 extra bytes? Is this some kind of magic marker?

Example:

#include <iostream>
#include <string>
#include <sstream>
#include <boost/serialization/split_free.hpp>
#include <boost/archive/binary_oarchive.hpp>

struct own_string { // simplified custom string class
    std::string content;
};

namespace boost
{
    namespace serialization
    {
        template<class Archive>
        inline void save(
            Archive & ar,
            const own_string & t,
            const unsigned int /* file_version */)
        {
            size_t size = t.content.size();
            ar << size;
            ar.save_binary(&t.content[0], size);
        }

        template<class Archive>
        inline void load(
            Archive & ar,
            own_string & t,
            const unsigned int /* file_version */)
        {
            size_t size;
            ar >> size;
            t.content.resize(size);
            ar.load_binary(&t.content[0], size);
        }

// split non-intrusive serialization function member into separate
// non intrusive save/load member functions
        template<class Archive>
        inline void serialize(
            Archive & ar,
            own_string & t,
            const unsigned int file_version)
        {
            boost::serialization::split_free(ar, t, file_version);
        }

    } // namespace serialization
} // namespace boost

std::string string_to_hex(const std::string& input)
{
    static const char* const lut = "0123456789ABCDEF";
    size_t len = input.length();

    std::string output;
    output.reserve(2 * len);
    for (size_t i = 0; i < len; ++i)
    {
        const unsigned char c = input[i];
        output.push_back(lut[c >> 4]);
        output.push_back(lut[c & 15]);
    }
    return output;
}

void test_normal_string()
{
    std::stringstream ss;
    boost::archive::binary_oarchive ar{ss};

    std::string test = "";

    std::cout << string_to_hex(ss.str()) << std::endl;
    ar << test;

    //adds 00 00 00 00 00 00 00 00
    std::cout << string_to_hex(ss.str()) << std::endl;
}

void test_own_string()
{
    std::stringstream ss;
    boost::archive::binary_oarchive ar{ss};

    std::string test = "";

    own_string otest{test};
    std::cout << string_to_hex(ss.str()) << std::endl;
    ar << otest;

    //adds 00 00 00 00 00 00 00 00 00 00 00 00 00
    std::cout << string_to_hex(ss.str()) << std::endl;
}

int main()
{
    test_normal_string();
    test_own_string();
}
choeger
  • 3,562
  • 20
  • 33

2 Answers2

4

So, you'd want to deserialize a previously serialized own_string as if it were a std::string.

From boost(1.65.1) doc:

By default, for each class serialized, class information is written to the archive. This information includes version number, implementation level and tracking behavior. This is necessary so that the archive can be correctly deserialized even if a subsequent version of the program changes some of the current trait values for a class. The space overhead for this data is minimal. There is a little bit of runtime overhead since each class has to be checked to see if it has already had its class information included in the archive. In some cases, even this might be considered too much. This extra overhead can be eliminated by setting the implementation level class trait to: boost::serialization::object_serializable.

Now, probably(*) this is the default for standard classes. In fact, adding

BOOST_CLASS_IMPLEMENTATION(own_string, boost::serialization::object_serializable)

at global scope makes test_X_string results in the same bytes. This should explain the observed extra bytes difference.

That said, I failed to find any specific guarantee concerning standard classes serialization traits (others may know better than me).

(*) actually the section about portability of traits settings mentions that:

Another way to avoid this problem is to assign serialization traits to all specializations of the template my_wrapper for all primitive types so that class information is never saved. This is what has been done for our implementation of serializations for STL collections

so this may give you enough confidence that standard collections (hence including std::string) will give the same bytes in this case.

Massimiliano Janes
  • 5,524
  • 1
  • 10
  • 22
1

I think your asking for undocumented implementation details. There doesn't have to be a why. It's an implementation detail of the archive format.

It's an interesting question.

You have to tell the library you don't want all features for your type (object tracking, type information, versioning). Specifically this demonstrates how to achieve the same footprint.

Note you obviously lose the functionality you disable

Live On Coliru

#include <iostream>
#include <string>
#include <sstream>

struct own_string { // simplified custom string class
    std::string content;
};

#include <boost/serialization/split_free.hpp>
#include <boost/serialization/tracking.hpp>
BOOST_CLASS_IMPLEMENTATION(own_string, boost::serialization::level_type::object_serializable)
BOOST_CLASS_TRACKING(own_string, boost::serialization::track_never)

//#include <boost/serialization/wrapper.hpp>
//BOOST_CLASS_IS_WRAPPER(own_string)

#include <boost/serialization/array_wrapper.hpp>

namespace boost
{
    namespace serialization
    {
        template<class Archive>
        inline void save(
            Archive & ar,
            const own_string & t,
            const unsigned int /* file_version */)
        {
            size_t size = t.content.size();
            ar & size;
            if (size)
                ar & boost::serialization::make_array(&t.content[0], size);
        }

        template<class Archive>
        inline void load(
            Archive & ar,
            own_string & t,
            const unsigned int /* file_version */)
        {
            size_t size;
            ar & size;
            t.content.resize(size);
            if (size)
                ar & boost::serialization::make_array(&t.content[0], size);
        }

        // split non-intrusive serialization function member into separate
        // non intrusive save/load member functions
        template<class Archive>
        inline void serialize(
            Archive & ar,
            own_string & t,
            const unsigned int file_version)
        {
            boost::serialization::split_free(ar, t, file_version);
        }

    } // namespace serialization
} // namespace boost

std::string string_to_hex(const std::string& input)
{
    static const char* const lut = "0123456789ABCDEF";
    size_t len = input.length();

    std::string output;
    output.reserve(2 * len);
    for (size_t i = 0; i < len; ++i)
    {
        const unsigned char c = input[i];
        output.push_back(lut[c >> 4]);
        output.push_back(lut[c & 15]);
    }
    return output;
}

#include <boost/archive/binary_oarchive.hpp>

void test_normal_string()
{
    std::stringstream ss;
    {
        boost::archive::binary_oarchive ar{ss, boost::archive::no_header|boost::archive::no_codecvt};

        std::string test = "";

        //std::cout << string_to_hex(ss.str()) << std::endl;
        ar << test;
    }

    //adds 00 00 00 00 00 00 00 00
    std::string bytes = ss.str();
    std::cout << string_to_hex(bytes) << " (" << bytes.size() << " bytes)\n";
}

void test_own_string()
{
    std::stringstream ss;
    {
        boost::archive::binary_oarchive ar{ss, boost::archive::no_header|boost::archive::no_codecvt};

        own_string otest{""};
        //std::cout << string_to_hex(ss.str()) << std::endl;
        ar << otest;
    }

    //adds 00 00 00 00 00 00 00 00 00 00 00 00 00
    std::string bytes = ss.str();
    std::cout << string_to_hex(bytes) << " (" << bytes.size() << " bytes)\n";
}

int main()
{
    test_normal_string();
    test_own_string();
}

Prints

0000000000000000 (8 bytes)
0000000000000000 (8 bytes)

Note the sample removes many other sources of noise/overhead.

sehe
  • 374,641
  • 47
  • 450
  • 633