3

I want to serialize and deserialize a class Mango . So I have created a function serialize and deserialize respectively.

? serialize(Mango &Man) /// What should be return ?
{
}

    
Mango deserialize(  ?   ) /// What should be function parameter ?
{
}

I don't know how to implement it very efficiently in terms of speed, portability , memory because it contains 10 members of custom data types ( I just mention one but they are all similar) which again are very complex.

I want suggestions for implementation for eg : what should be the return type of serialize function ? vector of bytes ie std::vector<uint8_t> serialize(Mango &Man) ? or should it be nothing like just serializing it into bytes and storing it in memory? or any other way?

Mango Class

class Mango
{
public:
    const MangoType &getMangoType() const { return typeMan; }
    MangoType &getMangoType() { return typeMan; }

private:
    // There are many members of different types : I just mention one.
    MangoType typeMan;
};

Data type classes

//MangoType Class
class MangoType
{
    /// It only has one member ie content
public:
    /// Getter of content vector.

    std::vector<FuntionMango> &getContent() noexcept { return Content; }

private:
    /// \name Data of MangoType.
    
    std::vector<FuntionMango> Content;
    
};


/// FuntionMango class.
class FuntionMango
{
public:
    /// Getter of param types.
    const std::vector<ValType> &getParamTypes() const noexcept
    {
        return ParamTypes;
    }
    std::vector<ValType> &getParamTypes() noexcept { return ParamTypes; }

    /// Getter of return types.
    const std::vector<ValType> &getReturnTypes() const noexcept
    {
        return ReturnTypes;
    }
    std::vector<ValType> &getReturnTypes() noexcept { return ReturnTypes; }

    

private:
    /// \name Data of FuntionMango.
   
    std::vector<ValType> ParamTypes;
    std::vector<ValType> ReturnTypes;

};

//ValType Class
  
enum class ValType : uint8_t
  {
     #define UseValType
     #define Line(NAME, VALUE, STRING) NAME = VALUE
     #undef Line
     #undef UseValType
  };

I want to know the best possible implementation plan in terms of speed and memory for serialize and deserialize functions.

Note : 1) I do not want to transfer it over the network. My usecase is that it is very time consuming to load data everytime in Mango class ( It comes after computation ). So I want to serialize it .. so that next time I want it , I can just deserialize the previous serialized data 2) I do not want to use library which requires linking like boost serialization directly. But is there any way to use it as header only ?

James_sheford
  • 153
  • 2
  • 13
  • as you already mentioned `boost serialization`, I assume you already know what type it use. how about doing it? – apple apple Oct 04 '22 at 21:18
  • @appleapple , I don't know much about it..but I can't use it directly.. because it requires linking..and we allowed only to use header only versions of boost – James_sheford Oct 04 '22 at 21:23
  • Serialization / deserialization is a huge pain in the backside. Use boost? Or google protocol buffers? – Eljay Oct 04 '22 at 23:54
  • @Eljay can we use boast serialization as a header only library? – James_sheford Oct 05 '22 at 05:20
  • @James_sheford Why would you like to serialize the data if you do not intend to transfer it to another machine? The point of serialization is that the data becomes compact and platform-independent. – Bart Oct 05 '22 at 09:26
  • @Bart , My usecase is that it is very time consuming to load data everytime in Mango class ( It comes after computation ). So I want to serialize it .. so that next time I want it , I can just deserialize the previous serialized data. – James_sheford Oct 05 '22 at 09:51
  • @James_sheford But serialization will add computational overhead and often increases the data size a bit to make it platform independent. Is serialization truly what you are looking for? Are you not looking for something like zipping? – Bart Oct 05 '22 at 10:00
  • @Bart, I don't want to make it platform independent.. I want to serialize it into a bytes of vectors or store it into memory check line 149 here https://www.onlinegdb.com/3SO2uK-l3e – James_sheford Oct 05 '22 at 10:04
  • Have a look here: [Binary Object Serialization With Structure Traversal & Reconstruction - Chris Ryan - CppNorth 2022](https://youtu.be/r51G_ECvIlk). It's a well-paced talk covering the fundamentals and provides an implementation you can use as is. It's communicated as *"not production ready"*, but certainly covers a wide range of use cases. If it doesn't cover your's, you can adjust it as needed. – IInspectable Oct 05 '22 at 10:30
  • 1
    Can You clarify the use case? Serialization is usually done for IO, either to disk, etc. or to network. So, do you need to store and load it from disk? Because otherwise, you can keep just the binary representation in memory as an instance of the class Mango. If Speed is the only requirement, I would safe a binary dump and load it again. But for that the classes must be POD like. So, if all the class members are trivial types, it is possible. But dangerous as well. There are many things that may cause UB if not used correctly. – TeaAge Solutions Oct 05 '22 at 11:10
  • I want to store and load it from disk ....or.....get the vector of bytes after serializing check line 149 https://www.onlinegdb.com/3SO2uK-l3e .. Can you show the binary dump method ? – James_sheford Oct 05 '22 at 11:19
  • 1
    Wait, are you saving it to disk, or just storing it as bytes in memory? If you are storing it as bytes in memory, why serialize, just make a copy of the data? Serialization is not a magic way to make things faster; it is usually slower to serialize than not. Serialization is for when you need to change address spaces, either because you want data for the next time you run the program, or run it over the network, or pass it to another machine. – Yakk - Adam Nevraumont Oct 05 '22 at 13:59
  • @Yakk-AdamNevraumont I am pretty sure ..I want to serialize the data – James_sheford Oct 05 '22 at 14:51
  • Perhaps the examples here give you some inspiration. It's possible to write them without any boost, obviously https://stackoverflow.com/a/26568212/85371 – sehe Oct 05 '22 at 14:52
  • @James_sheford You are asking "how best to do it", and you aren't describing your use case very well. Best serialization (in time and memory) to/from an in-memory buffer is different than best serialization to/from disk, and which you should use depends on why you want to do it. – Yakk - Adam Nevraumont Oct 05 '22 at 15:00
  • @Yakk-AdamNevraumont, Sorry I'm not sure about in memory buffer vs from disk. but my use case is to serialize the loaded data into a memory. And deserialized this memory multiple times. – James_sheford Oct 05 '22 at 15:35
  • @James_sheford And why serialize and not just copy the loaded data? – Yakk - Adam Nevraumont Oct 05 '22 at 16:02
  • @Yakk-AdamNevraumont , Sorry but what you mean by "just copy the loaded data "? If I just store the data as it .. Will it not occupy more space compare to serialization? – James_sheford Oct 05 '22 at 16:42
  • 1
    @James_sheford Why would it be significantly bigger? Serialization isn't magic, it is just data written using a "local address space" as opposed to your computer's memory address space. It could be smaller, but that would be specific problem domain specific - like, your data is inefficiently stored in memory. Why do you think serialized data would be smaller here? – Yakk - Adam Nevraumont Oct 05 '22 at 17:51
  • @Yakk-AdamNevraumont..can you please show..how to copy the data and store in memory? – James_sheford Oct 05 '22 at 22:22
  • @Yakk-AdamNevraumont, I get the loaded data (in Mango Class ) from binary format itself. So what you think should I revere the same code and convert that again in binary form and store that ? Or serialize the loaded data( in Mango Class ) ? – James_sheford Oct 05 '22 at 22:50
  • In case you know..the binary format I mentioned is .wasm [binary format](https://blog.ttulka.com/learning-webassembly-2-wasm-binary-format/) and after doing some computation it get loaded as a module ( in class Mango ).. – James_sheford Oct 05 '22 at 22:58
  • 1
    You asked my input in comment to another answer, but I'm still not sure I would go for serialization in your use case. More likely I would use the result of the "computationally intensive" part as the input of my Mango constructor and perform computation in some independent code. Thus serialization would become a moot issue. – kriss Oct 06 '22 at 12:34

2 Answers2

3

I commented:

Perhaps the examples here give you some inspiration. It's possible to write them without any boost, obviously Boost Serialization Binary Archive giving incorrect output

Because I hate when people say "obviously" on a Q&A site, let me show you. I'd suggest the interface to look like this:

std::vector<uint8_t> serialize(Mango const& Man);
Mango                deserialize(std::span<uint8_t const> data);

Alternatively, for file IO you could support e.g.:

void serialize_to_stream(std::ostream& os, Mango const& Man);
void deserialize(std::istream& is, Mango& Man);

Using the approach from the linked example, the suggested implementations would look like:

std::vector<uint8_t> serialize(Mango const& Man) {
    std::vector<uint8_t> bytes;
    do_generate(back_inserter(bytes), Man);
    return bytes;
}

Mango deserialize(std::span<uint8_t const> data) {
    Mango result;
    auto  f = begin(data), l = end(data);
    if (!do_parse(f, l, result))
        throw std::runtime_error("deserialize");
    return result;
}

void serialize_to_stream(std::ostream& os, Mango const& Man)  {
    do_generate(std::ostreambuf_iterator<char>(os), Man);
}

void deserialize(std::istream& is, Mango& Man) {
    Man = {}; // clear it!
    std::istreambuf_iterator<char> f(is), l{};
    if (!do_parse(f, l, Man))
        throw std::runtime_error("deserialize");
}

Of course, that assumes do_generate and do_parse customizations for all the relevant types (ValType, FunctionMango, MangoType, Mango):

Live On Coliru

#include <algorithm>
#include <iomanip> // debug output
#include <iostream>
#include <string>
#include <vector>
#include <span>

namespace MangoLib {
    // your requested signatures:
    class Mango;

    void serialize_to_stream(std::ostream& os, Mango const& Man);
    void deserialize(std::istream& is, Mango& Man);
    std::vector<uint8_t> serialize(Mango const& Man);
    Mango                deserialize(std::span<uint8_t const> data);

    // your specified types (with some demo fill)
    enum class ValType : uint8_t {
#define UseValType
#define Line(NAME, VALUE, STRING) NAME = VALUE
        Line(void_,   0, "void"),
        Line(int_,    1, "int"),
        Line(bool_,   2, "bool"),
        Line(string_, 3, "string"),
#undef Line
#undef UseValType
    };

    using ValTypes = std::vector<ValType>;
    class FuntionMango {
      public:
        const ValTypes& getParamTypes() const noexcept { return ParamTypes; }
        ValTypes& getParamTypes() noexcept { return ParamTypes; }

        const ValTypes& getReturnTypes() const noexcept { return ReturnTypes; }
        ValTypes& getReturnTypes() noexcept { return ReturnTypes; }

      private:
        ValTypes ParamTypes, ReturnTypes;
    };

    using FuntionMangos = std::vector<FuntionMango>;

    class MangoType {
      public:
        FuntionMangos&       getContent() noexcept { return Content; }
        const FuntionMangos& getContent() const noexcept { return Content; }

      private:
        FuntionMangos Content;
    };

    class Mango {
      public:
        const MangoType& getMangoType() const { return typeMan; }
        MangoType&       getMangoType() { return typeMan; }

      private:
        MangoType typeMan;
        // many other members
    };
} // namespace MangoLib

namespace my_serialization_helpers {

    ////////////////////////////////////////////////////////////////////////////
    // This namespace serves as an extension point for your serialization; in
    // particular we choose endianness and representation of strings
    //
    // TODO add overloads as needed (signed integer types, binary floats,
    // containers of... etc)
    ////////////////////////////////////////////////////////////////////////////
    
    // decide on the max supported container capacity:
    using container_size_type = std::uint32_t;
    
    ////////////////////////////////////////////////////////////////////////////
    // generators
    template <typename Out>
    Out do_generate(Out out, std::string const& data) {
        container_size_type len = data.length();
        out = std::copy_n(reinterpret_cast<char const*>(&len), sizeof(len), out);
        return std::copy(data.begin(), data.end(), out);
    }

    template <typename Out, typename T>
    Out do_generate(Out out, std::vector<T> const& data) {
        container_size_type len = data.size();
        out = std::copy_n(reinterpret_cast<char const*>(&len), sizeof(len), out);
        for (auto& el : data)
            out = do_generate(out, el);
        return out;
    }

    template <typename Out> Out do_generate(Out out, uint8_t const& data) {
        return std::copy_n(&data, sizeof(data), out);
    }

    template <typename Out>
    Out do_generate(Out out, uint16_t const& data) {
        return std::copy_n(reinterpret_cast<char const*>(&data), sizeof(data), out);
    }

    template <typename Out>
    Out do_generate(Out out, uint32_t const& data) {
        return std::copy_n(reinterpret_cast<char const*>(&data), sizeof(data), out);
    }

    ////////////////////////////////////////////////////////////////////////////
    // parsers
    template <typename It>
    bool parse_raw(It& in, It last, char* raw_into, size_t n) { // length guarded copy_n
        while (in != last && n) {
            *raw_into++ = *in++;
            --n;
        }
        return n == 0;
    }

    template <typename It, typename T>
    bool parse_raw(It& in, It last, T& into) {
        static_assert(std::is_trivially_copyable_v<T>);
        return parse_raw(in, last, reinterpret_cast<char*>(&into), sizeof(into));
    }

    template <typename It>
    bool do_parse(It& in, It last, std::string& data) {
        container_size_type len;
        if (!parse_raw(in, last, len))
            return false;
        data.resize(len);
        return parse_raw(in, last, data.data(), len);
    }

    template <typename It, typename T>
    bool do_parse(It& in, It last, std::vector<T>& data) {
        container_size_type len;
        if (!parse_raw(in, last, len))
            return false;
        data.clear();
        data.reserve(len);
        while (len--) {
            data.emplace_back();
            if (!do_parse(in, last, data.back()))
                return false;
        };
        return true;
    }

    template <typename It>
    bool do_parse(It& in, It last, uint8_t& data) {
        return parse_raw(in, last, data);
    }

    template <typename It>
    bool do_parse(It& in, It last, uint16_t& data) {
        return parse_raw(in, last, data);
    }

    template <typename It>
    bool do_parse(It& in, It last, uint32_t& data) {
        return parse_raw(in, last, data);
    }
}

namespace MangoLib {

    template <typename Out> Out do_generate(Out out, ValType const& x) {
        using my_serialization_helpers::do_generate;
        return do_generate(out,
                           static_cast<std::underlying_type_t<ValType>>(x));
    }
    template <typename It> bool do_parse(It& in, It last, ValType& x) {
        using my_serialization_helpers::do_parse;
        std::underlying_type_t<ValType> tmp;
        bool ok = do_parse(in, last, tmp);
        if (ok)
            x = static_cast<ValType>(tmp);
        return ok;
    }

    template <typename Out> Out do_generate(Out out, FuntionMango const& x) {
        using my_serialization_helpers::do_generate;
        out = do_generate(out, x.getParamTypes());
        out = do_generate(out, x.getReturnTypes());
        return out;
    }
    template <typename It> bool do_parse(It& in, It last, FuntionMango& x) {
        using my_serialization_helpers::do_parse;
        return do_parse(in, last, x.getParamTypes()) &&
            do_parse(in, last, x.getReturnTypes());
    }

    template <typename Out> Out do_generate(Out out, MangoType const& x) {
        using my_serialization_helpers::do_generate;
        out = do_generate(out, x.getContent());
        return out;
    }
    template <typename It> bool do_parse(It& in, It last, MangoType& x) {
        using my_serialization_helpers::do_parse;
        return do_parse(in, last, x.getContent());
    }

    template <typename Out> Out do_generate(Out out, Mango const& x) {
        out = do_generate(out, x.getMangoType());
        return out;
    }
    template <typename It> bool do_parse(It& in, It last, Mango& x) {
        return do_parse(in, last, x.getMangoType());
    }
}

#include <cassert>

MangoLib::Mango makeMango() {
    MangoLib::Mango mango;

    using MangoLib::ValType;
    MangoLib::FuntionMango f1;
    f1.getParamTypes()  = {ValType::bool_, ValType::string_};
    f1.getReturnTypes() = {ValType::void_};

    MangoLib::FuntionMango f2;
    f2.getParamTypes()  = {ValType::string_};
    f2.getReturnTypes() = {ValType::int_};

    mango.getMangoType().getContent() = {f1, f2};
    return mango;
}

#include <fstream>

int main() {
    auto const mango = makeMango();

    auto const bytes = serialize(mango);
    auto const roundtrip = serialize(MangoLib::deserialize(bytes));
    assert(roundtrip == bytes);

    // alternatively with file IO:
    {
        std::ofstream ofs("output.bin", std::ios::binary);
        serialize_to_stream(ofs, mango);
    }
    // read back:
    {
        std::ifstream ifs("output.bin", std::ios::binary);
        MangoLib::Mango from_file;
        deserialize(ifs, from_file);

        assert(serialize(from_file) == bytes);
    }

    std::cout << "\nDebug dump " << std::dec << bytes.size() << " bytes:\n";
    for (auto ch : bytes)
        std::cout << "0x" << std::hex << std::setw(2) << std::setfill('0')
                  << static_cast<int>((uint8_t)ch) << " " << std::dec;
    std::cout << "\nDone\n";
}

// suggested implementations:
namespace MangoLib {
    std::vector<uint8_t> serialize(Mango const& Man) {
        std::vector<uint8_t> bytes;
        do_generate(back_inserter(bytes), Man);
        return bytes;
    }

    Mango deserialize(std::span<uint8_t const> data) {
        Mango result;
        auto  f = begin(data), l = end(data);
        if (!do_parse(f, l, result))
            throw std::runtime_error("deserialize");
        return result;
    }

    void serialize_to_stream(std::ostream& os, Mango const& Man)  {
        do_generate(std::ostreambuf_iterator<char>(os), Man);
    }

    void deserialize(std::istream& is, Mango& Man) {
        Man = {}; // clear it!
        std::istreambuf_iterator<char> f(is), l{};
        if (!do_parse(f, l, Man))
            throw std::runtime_error("deserialize");
    }
}

Which roundtrips correctly and prints the debug output:

Debug dump 25 bytes:
0x02 0x00 0x00 0x00 0x02 0x00 0x00 0x00 0x02 0x03 0x01 0x00 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x03 0x01 0x00 0x00 0x00 0x01 
Done

Portability

This assumes endianness is not an issue. Of course you might want to normalize endianness. You can do it manually (using ntoh/hton family e.g.), or you could use Boost Endian - which does not require linking to any boost library (Boost Endian is header-only).

E.g.: http://coliru.stacked-crooked.com/a/288829ec964a3ca9

sehe
  • 374,641
  • 47
  • 450
  • 633
  • how will it work with my problem? can you please show me with my class? – James_sheford Oct 06 '22 at 07:05
  • 1
    Rewrote the answer from your types. Implemented all the suggested interfaces. – sehe Oct 06 '22 at 11:15
  • what if Mango class contains more fields like MangoType typeMan , FruiteType TypeFru, ColorType typeColor...and many more...Is your solution fast enough compare to any other method ? – James_sheford Oct 06 '22 at 14:06
  • Performance wasn't on your list of requirements. Everything depends on your application. The other answer is more uncompromising for speed but doesn't immediately afford container convenience etc. How about using memory mapped files to begin with? http://coliru.stacked-crooked.com/a/29965b79f234e110 – sehe Oct 06 '22 at 15:36
  • if I used boost serialization will it be way more faster than your solution ? – James_sheford Oct 06 '22 at 17:22
  • Boost's Serialization is strictly more complicated and has a ton of features not here, so it will be slower. – sehe Oct 06 '22 at 20:38
  • I'm not going to/able to divine what is fastest in your particular scenario, because "the best" depends on many factors. At some points the design trade-offs tip over so e.g. the maintenance cost/susceptibility for error becomes more than is warranted by e.g. performance edge. – sehe Oct 06 '22 at 20:40
  • I cannot comment in a way that speaks more than the code. If it helps, they are the [simplest form of "customization points"](https://quuxplusone.github.io/blog/2018/03/19/customization-points-for-functions/) that can leverage [ADL](https://en.cppreference.com/w/cpp/language/adl) for your own overloads. Nothing more. – sehe Oct 07 '22 at 13:47
  • can you give a more good example ? Like can you make makeMango() with some values? here: ```auto const mango = makeMango();``` – James_sheford Oct 20 '22 at 11:21
  • @James_sheford I don't know how I can "give a more good example". Regarding generating some values, you can use the code from my [Oct 6th comment](https://stackoverflow.com/questions/73953638/how-to-do-serialization-of-class-having-members-of-custom-data-types-in-c/73963168?noredirect=1#comment130618320_73963168). If you have a concrete question about what you're stuck with, you can post a separate question, thank you. – sehe Oct 21 '22 at 12:21
1

As @Eljay says in a comment, the exact solution depends on a use case.

For me, if it is a one-off project, the most straight-forward "binary dump" method would be to reconsider your basic datatypes and store everything compactly, using a fixed-size structures.

struct FuntionMango
{
    int NumParams; // valid items in Param/Return arrays
    int NumReturns;

    ValType ParamTypes[MAX_PARAMS];
    ValType ReturnTypes[MAX_RETURNS];
};

struct MangoType
{
    int NumContent; // valid items in Content array
    // Fixed array instead of vector<FuntionMango>
    FuntionMango Content[MAX_FUNCTIONS];
};

struct Mango // all fields are just 'public'
{
    MangoType typeMan;
};

Then the "save" procedure would be

void saveMango(const char* filename, Mango* mango)
{
    FILE* OutFile = fopen(...);
    fwrite(mango, 1, sizeof(Mango), OutFile);
    fclose(OutFile);
}

and load just uses "fread" (of course, all error handling and file integrity checking is omitted)

void loadMango(const char* filename, Mango* mango)
{
    FILE* InFile = fopen(...);
    fread(mango, 1, sizeof(Mango), InFile);
    fclose(InFile);
}

To convert you Mango into a byte array, just use a reinterpret_cast or a C-style cast.

Unfortunately, this approach would fail if any of your structures either contains pointer fields or has non-trivial constructors/destructors.

[EDIT (on request)]

Conversion to a byte array (filling an std::vector<uint8_t>) can be done by using standard constructor of std::vector

Mango mango;
uint8_t* rawPointer = reinterpret_cast<uint8_t*>(&mango);
std::vector<uint8_t> byteArray(rawPointer, rawPointer + sizeof(Mango));

And vice versa, convert byte array to Mango

Mango otherMango;
uint8_t* rawPointer2 = reinterpret_cast<uint8_t*>(&otherMango);
memcpy(rawPointer2, byteArray.data(), sizeof(Mango));
Viktor Latypov
  • 14,289
  • 3
  • 40
  • 55
  • Or non trivial destructors. The point is when they are not trivially copyable. Which can be asserted. – sehe Oct 05 '22 at 21:04
  • @sehe Yes, my method here is "anti-C++" (the whole point is to evaluate if the OP really needs complex structures and full support for nested non-trivial STL-heavy datatypes). The answer I gave here is because the OP requested some advice in a comment for another answer . I would go the protobuf/Qt(moc)/boost.serialization way or implement some C++ header preprocessor myself. – Viktor Latypov Oct 06 '22 at 06:03
  • @ViktorLatypov , Thanks.. But I do not need it to write it in a file...I want them as vector or in memory – James_sheford Oct 06 '22 at 10:31
  • @ViktorLatypow, The class does not have any constructors / destructure.. can you please do it in form that give me a vector of byte ( serialize data ) rather than writing in the file? – James_sheford Oct 06 '22 at 10:34
  • 1
    Basically, a memcpy call is what you need - or a wrapped memcpy call in std::vector's constructor. I have added this lines to the answer. By the way, "the class does not have any constructors/destructors" is a dangerous assumptions. When your are using std::vector or similar containers, you get automatically-generated constructor which implicitly initializes these fields. Make sure you use only fixed-size arrays as your structure's fields (as is shown in my code sample). Don't try to memcpy a structure with std::vector fields ! – Viktor Latypov Oct 06 '22 at 11:56
  • @ViktorLatypov what if Mango class contains more fields like MangoType typeMan , FruiteType TypeFru, ColorType typeColor...and many more...Is your solution fast and safe enough compare to any other method ? – James_sheford Oct 06 '22 at 18:48
  • If all the types are POD (plain-old-data, no pointers, fixed-size arrays, no constructors/destructors), then this method works and is rather fast (a single memcpy for your data structure). And as any C++ programmer would say, it is as safe as a sharp two-sided razor without a handle... I mean, it will work, but be careful with number of elements in arrays and out-of-bounds stuff. – Viktor Latypov Oct 06 '22 at 21:10
  • @ViktorLatypov, I see there is one contructor in another mango fileds..with this will your solution fail. Completely.?? Or can use different function other than memcy? – James_sheford Oct 07 '22 at 06:30
  • Depends on what the constructor does. If it is { someField = const_value; } then most likely it is ok to memcpy. By the way, my solution is literary 3-5 lines of code - you can check it out in you case. Most likely, it will just work. – Viktor Latypov Oct 07 '22 at 18:04