1

I'm working with a machine with a proprietary protocol. I defined all of the requests and responses. But some of the variables need to be serialized in an.. odd way. Some of parts of the protocol receive simply the binary representation of the objects, but in some parts I have to encode integers into an array of humanly readable characters. Some of those arrays need to be encoded into decimal (allowed characters are '0' -'9') and some into hexadecimal ('0'-'9', 'A'-'F'). Also some of the booleans are encoded as either "01" or "00". Let's say I change those around with the function:

template<size_t arraySize, unsigned int base>
std::array<unsigned char, arraySize> intToChars(unsigned int input);

From the examples I have seen, it seems that I can only describe a serialize function, and not a deserialize function. But in this instance the serialization and deserialization process is completely different I think. I don't think Boost/serialization can figure out what I did to the integer before I pushed it into the archive. So I can't just simply slap on something like:

template<class Archive>
void amp::types::data::ParametricFilterData::serialize(
        boost::archive::binary_oarchive &ar, const unsigned int version) {
    ar & util::boolToChars<2>(enabled);
    ar & util::intToChars<4,16>(frequency);
    ar & util::intToChars<4,16>(gain);
    ar & util::intToChars<4,10>(slope);
}
}

Because I also made a different function

template<size_t arraySize, unsigned int base>
unsigned int charsToInt(std::array<unsigned char,arraySize> input);

But I don't know of a deserialize function to put it

P.S. I'm trying to find a good reference with all the classes and functions included in boost/serialization, but so far I've only found https://www.boost.org/doc/libs/1_34_0/libs/serialization/doc/index.html

Typhaon
  • 828
  • 8
  • 27
  • 1
    Eeeek! That sounds horrid. Your description of some of the wireformat in this proprietary protocol does have some resemblence to some of the ASN.1 wireformats. For instance with ASN.1 uPER, integers can be represented either in a binary way (for integers not exceeding some large value) or as text (for very, very big values). Similar with floats. However, the bools don't fit my knowledge of how uPER works. But just in case, has ASN.1 ever been mentioned at all, anywhere, in any of the documentation? – bazza Nov 02 '22 at 18:13
  • Side note: in Boost serialization both `save` and `load` are implemented in terms of `serialize` by default: it's a design goal of the library to avoid repeating the code. You **can** split the implementations (even in save/load and save_construct_data and load_construct_data in advanced use cases) but I don't think it matters for your question. – sehe Nov 03 '22 at 01:42

1 Answers1

1

If you need to control the wire-format Boost Serialization is not your ticket. Full stop.

Instead, Boost Serialization is opinionated and defines its own archive formats (e.g. XML, Binary and Text). None of these formats will fit your needs, and even if isolated details can be controlled, on the whole the larger archive features (archive versioning, container serialization, object tracking (e.g. here), type registration (e.g. here), polymorphic types, class versioning etc.) will ruin your day.

Instead you probably have to devise your own kind of serialization. I happen to have recently created a simplistic example "framework": Boost Serialization Binary Archive giving incorrect output (1): basically, for any T you implement

 OutputIterator do_generate(OutputIterator, T const& obj);
 bool do_parse(ForwardIterator& f, ForwardIterator l, T& obj);

Now since that design uses iterators, it will be relatively easy to support the other formats out of the box using other mechanisms like std::to_chars/std::from_chars, std::format (or libfmt, boost::format etc.) and Spirit (e.g. X3) for the parsing.

Let me modify the example to show e.g. the bool ("01"/"00") and hex formatting for any signed integer:

template <typename Out>
Out do_generate(Out out, int32_t const& data) {
    return fmt::format_to(out, "[{:x}]", data);
}

template <typename Out>
Out do_generate(Out out, bool const& data) {
    return fmt::format_to(out, "{:02}", data ? 1 : 0);
}

I used libfmt to do the serialization. To deserialize:

template <typename It>
bool do_parse(It& f, It l, int32_t& data) {
    static const x3::int_parser<int32_t, 16, 1, 8> p{};
    return parse(f, l, '[' >> p >> ']', data);
}

template <typename It>
bool do_parse(It& f, It l, bool& data) {
    return parse(f, l, x3::int_parser<bool, 10, 2, 2>(), data);
}

Simplified example Live On Coliru

#include <algorithm>
#include <iostream>
#include <span>
#include <string>
#include <vector>

#include <boost/spirit/home/x3.hpp> // using x3 for parsing
#include <fmt/ranges.h>             // using libfmt
#include <fstream>                  // debug output
namespace x3 = boost::spirit::x3;

namespace MyLib {
    // your types (with some demo fill)
    using ValType  = int32_t;
    using ValTypes = std::vector<ValType>;
    struct DataType {
        ValTypes              params, retval;
        bool                  blessed;
        std::vector<uint16_t> data;
    };

    // high-level interface
    std::vector<uint8_t> serialize(DataType const& Man);
    DataType             deserialize(std::span<uint8_t const> data);

} // namespace MyLib

namespace my_serialization_helpers {

    ////////////////////////////////////////////////////////////////////////////
    // This namespace serves as an extension point for your serialization; in
    // particular we choose endianness and representation of strings
    //
    // TODO add overloads as needed (signed integer types, binary floats,
    // containers of... etc)
    ////////////////////////////////////////////////////////////////////////////
    
    // decide on the max supported container capacity:
    using container_size_type = std::uint32_t;
    
    ////////////////////////////////////////////////////////////////////////////
    // generators
    template <typename Out> Out do_generate(Out out, uint8_t const& data) {
        return std::copy_n(&data, sizeof(data), out);
    }

    template <typename Out>
    Out do_generate(Out out, uint16_t const& data) {
        return std::copy_n(reinterpret_cast<char const*>(&data), sizeof(data), out);
    }

    template <typename Out>
    Out do_generate(Out out, uint32_t const& data) {
        return std::copy_n(reinterpret_cast<char const*>(&data), sizeof(data), out);
    }

    template <typename Out>
    Out do_generate(Out out, int32_t const& data) {
        return fmt::format_to(out, "[{:x}]", data);
    }

    template <typename Out>
    Out do_generate(Out out, bool const& data) {
        return fmt::format_to(out, "{:02}", data ? 1 : 0);
    }

    template <typename Out>
    Out do_generate(Out out, std::string const& data) {
        container_size_type len = data.length();
        out = std::copy_n(reinterpret_cast<char const*>(&len), sizeof(len), out);
        return std::copy(data.begin(), data.end(), out);
    }

    template <typename Out, typename T>
    Out do_generate(Out out, std::vector<T> const& data) {
        container_size_type len = data.size();
        out = std::copy_n(reinterpret_cast<char const*>(&len), sizeof(len), out);
        for (auto& el : data)
            out = do_generate(out, el);
        return out;
    }

    ////////////////////////////////////////////////////////////////////////////
    // parsers
    template <typename It>
    bool parse_raw(It& in, It last, char* raw_into, size_t n) { // length guarded copy_n
        while (in != last && n) {
            *raw_into++ = *in++;
            --n;
        }
        return n == 0;
    }

    template <typename It, typename T>
    bool parse_raw(It& in, It last, T& into) {
        static_assert(std::is_trivially_copyable_v<T>);
        return parse_raw(in, last, reinterpret_cast<char*>(&into), sizeof(into));
    }

    template <typename It>
    bool do_parse(It& in, It last, uint8_t& data) {
        return parse_raw(in, last, data);
    }

    template <typename It>
    bool do_parse(It& in, It last, uint16_t& data) {
        return parse_raw(in, last, data);
    }

    template <typename It>
    bool do_parse(It& in, It last, uint32_t& data) {
        return parse_raw(in, last, data);
    }

    template <typename It>
    bool do_parse(It& f, It l, int32_t& data) {
        static const x3::int_parser<int32_t, 16, 1, 8> p{};
        return parse(f, l, '[' >> p >> ']', data);
    }

    template <typename It>
    bool do_parse(It& f, It l, bool& data) {
        return parse(f, l, x3::int_parser<bool, 10, 2, 2>(), data);
    }

    template <typename It>
    bool do_parse(It& in, It last, std::string& data) {
        container_size_type len;
        if (!parse_raw(in, last, len))
            return false;
        data.resize(len);
        return parse_raw(in, last, data.data(), len);
    }

    template <typename It, typename T>
    bool do_parse(It& in, It last, std::vector<T>& data) {
        container_size_type len;
        if (!parse_raw(in, last, len))
            return false;
        data.clear();
        data.reserve(len);
        while (len--) {
            data.emplace_back();
            if (!do_parse(in, last, data.back()))
                return false;
        };
        return true;
    }

}

namespace MyLib {
    template <typename Out> Out do_generate(Out out, DataType const& x) {
        using my_serialization_helpers::do_generate;
        out = do_generate(out, x.params);
        out = do_generate(out, x.retval);
        out = do_generate(out, x.blessed);
        out = do_generate(out, x.data);
        return out;
    }
    template <typename It> bool do_parse(It& in, It last, DataType& x) {
        using my_serialization_helpers::do_parse;
        return do_parse(in, last, x.params) && //
            do_parse(in, last, x.retval) &&    //
            do_parse(in, last, x.blessed) &&   //
            do_parse(in, last, x.data);
    }
}

int main() {
    MyLib::DataType const object =
    { {1111, 2222, 3333, 4444, 5555, 6666, 7777}, // params
      {0xAAAA, 0xBBBB, 0xCCCC, 0xDDDD},           // retvals
      true,                                       // blessed
      {1111, 2222, 3333, 4444, 5555, 6666, 7777}, // data
    };

    auto const bytes = serialize(object);
    fmt::print("\nDebug dump {} bytes: {::02X}\n", bytes.size(), bytes);

    std::ofstream("dump.bin").write(reinterpret_cast<char const*>(bytes.data()), bytes.size());

    auto const roundtrip = serialize(MyLib::deserialize(bytes));
    fmt::print("Roundtrip verified: {}\n", roundtrip == bytes);
}

// suggested implementations:
namespace MyLib {
    std::vector<uint8_t> serialize(DataType const& obj) {
        std::vector<uint8_t> bytes;
        do_generate(back_inserter(bytes), obj);
        return bytes;
    }

    DataType deserialize(std::span<uint8_t const> data) {
        DataType obj;
        auto  f = begin(data), l = end(data);
        if (!do_parse(f, l, obj))
            throw std::runtime_error("deserialize");
        return obj;
    }
}

Prints:

Debug dump 91 bytes: [07, 00, 00, 00, 5B, 34, 35, 37, 5D, 5B, 38, 61, 65, 5D, 5B, 64, 30, 35, 5D, 5B, 31, 31, 35, 63, 5D, 5B, 31, 35, 62, 33, 5D, 5B, 31, 61, 30, 61, 5D, 5B, 31, 65, 36, 31, 5D, 04, 00, 00, 00, 5B, 61, 61, 61, 61, 5D, 5B, 62, 62, 62, 62, 5D, 5B, 63, 63, 63, 63, 5D, 5B, 64, 64, 64, 64, 5D, 30, 31, 07, 00, 00, 00, 57, 04, AE, 08, 05, 0D, 5C, 11, B3, 15, 0A, 1A, 61, 1E]
Roundtrip verified: true

And writes the same bytes to dump.bin:

00000000: 0700 0000 5b34 3537 5d5b 3861 655d 5b64  ....[457][8ae][d
00000010: 3035 5d5b 3131 3563 5d5b 3135 6233 5d5b  05][115c][15b3][
00000020: 3161 3061 5d5b 3165 3631 5d04 0000 005b  1a0a][1e61]....[
00000030: 6161 6161 5d5b 6262 6262 5d5b 6363 6363  aaaa][bbbb][cccc
00000040: 5d5b 6464 6464 5d30 3107 0000 0057 04ae  ][dddd]01....W..
00000050: 0805 0d5c 11b3 150a 1a61 1e              ...\.....a.

(1) see also the more recent How to do serialization of Class having members of custom data types in C++? which builds on that and this How can I optimize C++? which goes beyond

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks for the answer! I was hoping that the binary archive would be able to fit my needs if I remove the header. Apparently not. I'm still not sure how spirit will fit my needs instead, but I will read up about it. – Typhaon Nov 03 '22 at 08:52
  • Spirit is not required at all, it's just what I'd use to avoid having to write the parsers manually. As you can see `do_parse` just takes iterators, and you can do the work in any way you want (regex, spirit, standard library etc) – sehe Nov 03 '22 at 10:19