How can I faster encoding randomclass?

Question

I want to serialize a C++ class Ramdomclass . Below is the serialization function.

std::vector<uint8_t> Serialize(Ramdomclass &Ramdom)
{
   
    std::vector<uint8_t> Buf;

   
    auto EncodedSize = EncodeU32(Ramdom.getName().length());
    Buf.insert(Buf.end(), EncodedSize.begin(), EncodedSize.end());
    Buf.insert(Buf.end(), Ramdom.getName().cbegin(), Ramdom.getName().cend());

    
    EncodedSize = EncodeU32(Ramdom.getContent().size());
    Buf.insert(Buf.end(), EncodedSize.begin(), EncodedSize.end());
    Buf.insert(Buf.end(), Ramdom.getContent().cbegin(), Ramdom.getContent().cend());

   
    std::vector<uint8_t> Result;
    Result.push_back(0x00U);

    EncodedSize = EncodeU32(Buf.size());
    Result.insert(Result.end(), EncodedSize.begin(), EncodedSize.end());

   
    Result.insert(Result.end(), Buf.begin(), Buf.end());

    
    return Result;
}


std::vector<uint8_t> EncodeU32(uint32_t d)
{

    std::vector<uint8_t> encoded;

    // unsigned LEB128 encoding
    do
    {
        auto x = d & 0b01111111;
        d >>= 7;
        if (d)
            x |= 0b10000000;
        encoded.push_back(x);
    } while (d);

    return encoded;
}

I think I can improve in terms of the way I am appending different parts in std::vector<uint8_t> Buf;. I want to know is there any better way of doing this ? maybe instead of using insert can I used anyone other way

Note:: All the things I am appending to std::vector<uint8_t> Buf are in bytes(binary).

Encoding *always* slows down execution. Removing the encoding part will make your serialization faster. — Thomas Matthews, Oct 15 '22 at 18:21
@ThomasMatthews thanks.. I have to do that ..the numbes that goes in EncodeU32 function are size of vector like 20,30 etc... — Sebastian Orteho, Oct 15 '22 at 18:55

sehe · Accepted Answer · 2022-10-17T14:50:24.100

The signature

std::vector<uint8_t> EncodeU32(uint32_t);

implies a lot of copying, and temporary vector construction. Prefer something like

template <typename Out> Out EncodeLEB128(uint32_t, Out);

Here you use an output iterator instead of allocating vectors.

The seconds issue is that you need to know the DOM size, causing you to copy all of the DOM data again. Instead, make it so you can calculate the size up front and avoid the extra copy:

Let's also remove the code duplication that makes it hard to read/maintain the code.

Predicting Sizes: LEB128

Variable-length integer encoding is nice, but it complicates predicting the effective serialized data size. Let's create an extra helper:

template <typename T> size_t LEB128_Len(T d) {
    return EncodeLEB128_Impl(std::move(d), [](auto&&){});
}

template <typename T, typename Out> Out EncodeLEB128(T d, Out out) {
    EncodeLEB128_Impl(std::move(d), [&](uint8_t v) { *out++ = v; });
    return out;
}

As you can see, I plan on implementing both with EncodeLEB128_Impl - again avoiding code duplication. Here it is, you'll not it's pretty much identical to your original code, except for side-effects and genericity:

template <typename T, typename F> size_t EncodeLEB128_Impl(T d, F callback) {
    static_assert(std::is_unsigned_v<T> && std::is_integral_v<T>);

    // unsigned LEB128 encoding
    size_t n = 0;
    do {
        unsigned int x = d & 0b01111111;
        d >>= 7;
        if (d)
            x |= 0b10000000;
        n++;
        callback(x);
    } while (d);

    return n;
}

Predicting Content Length

Now we can move up to ranges. The length prediction can now become:

template <typename R>
size_t Range_Len(R const& range) {
    using V  = decltype(*std::begin(range));
    size_t n = std::size(range);
    return LEB128_Len(n) + n * sizeof(V);
}

That's ... nice! Now we can picture the end result:

std::vector<uint8_t> Serialize(DomType const& dom) {
    auto const& name    = dom.getName();
    auto const& content = dom.getContent();
    auto const  domSize = Range_Len(name) + Range_Len(content);

    std::vector<uint8_t> result(1 + LEB128_Len(domSize) + domSize);
    auto                 out = result.begin();

    *out++ = 0x00U; // dom ID
    out    = EncodeLEB128(domSize, out);
    out    = EncodeRange(name, out);
    out    = EncodeRange(content, out);

    return result;
}

Notice how much cleaner that is! No more unnecessary copying or allocating, no more code duplication.

The only missing link is EncodeRange:

template <std::contiguous_iterator It, typename Out>
Out EncodeRange(It f, It l, Out out) {
    using V = typename std::iterator_traits<It>::value_type;
    static_assert(std::is_trivially_copyable_v<V>);

    size_t const n     = std::distance(f, l);
    auto const*  bytes = reinterpret_cast<uint8_t const*>(std::addressof(*f));
    return std::copy(bytes, bytes + n * sizeof(V), EncodeLEB128(n, out));
}

template <typename R, typename Out>
Out EncodeRange(R const& range, Out out) {
    return EncodeRange(std::begin(range), std::end(range), out);
}

Live Demo

Live On Compiler Explorer

Live On Coliru

#include <cstdint>
#include <iterator>
#include <span>
#include <string_view>
#include <type_traits>
#include <vector>

struct DomType {
    std::array<uint8_t, 16> data_{1, 2,  3,  4,  5,  6,  7,  8,
                                  9, 10, 11, 12, 13, 14, 15, 16};
    std::span<uint8_t const> getContent() const { return data_; }
    std::string_view         getName() const { return "name"; }
};

template <typename T, typename F> size_t EncodeLEB128_Impl(T d, F callback) {
    static_assert(std::is_unsigned_v<T> && std::is_integral_v<T>);

    // unsigned LEB128 encoding
    size_t n = 0;
    do {
        unsigned int x = d & 0b01111111;
        d >>= 7;
        if (d)
            x |= 0b10000000;
        n++;
        callback(x);
    } while (d);

    return n;
}

template <typename T> size_t LEB128_Len(T d) {
    return EncodeLEB128_Impl(std::move(d), [](auto&&){});
}

template <typename T, typename Out> Out EncodeLEB128(T d, Out out) {
    EncodeLEB128_Impl(std::move(d), [&](uint8_t v) { *out++ = v; });
    return out;
}

template <std::contiguous_iterator It, typename Out>
Out EncodeRange(It f, It l, Out out) {
    using V = typename std::iterator_traits<It>::value_type;
    static_assert(std::is_trivially_copyable_v<V>);

    size_t const n     = std::distance(f, l);
    auto const*  bytes = reinterpret_cast<uint8_t const*>(std::addressof(*f));
    return std::copy(bytes, bytes + n * sizeof(V), EncodeLEB128(n, out));
}

template <typename R, typename Out>
Out EncodeRange(R const& range, Out out) {
    return EncodeRange(std::begin(range), std::end(range), out);
}

template <typename R>
size_t Range_Len(R const& range) {
    using V  = decltype(*std::begin(range));
    size_t n = std::size(range);
    return LEB128_Len(n) + n * sizeof(V);
}

std::vector<uint8_t> Serialize(DomType const& dom) {
    auto const& name    = dom.getName();
    auto const& content = dom.getContent();
    auto const  domSize = Range_Len(name) + Range_Len(content);

    std::vector<uint8_t> result(1 + LEB128_Len(domSize) + domSize);
    auto                 out = result.begin();

    *out++ = 0x00U; // dom ID
    out    = EncodeLEB128(domSize, out);
    out    = EncodeRange(name, out);
    out    = EncodeRange(content, out);

    return result;
}

#include <fmt/ranges.h>
int main() { fmt::print("Result: {::02x}", Serialize(DomType{})); }

Prints

Result: [00, 16, 04, 6e, 61, 6d, 65, 10, 01, 02, 03, 04, 05, 06, 07, 08, 09, 0a, 0b, 0c, 0d, 0e, 0f, 10]

Thanks.. I am also looking for speed performance. Do you think your method is faster than mine this code https://onlinegdb.com/0MYoSJpkG ? — Sebastian Orteho, Oct 18 '22 at 22:59
@SebastianOrteho Huh, yeah obviously. And for all the reasons I explained in my answer. Why don't you just measure it anyways? — sehe, Oct 19 '22 at 00:13
Sure, so can you! http://coliru.stacked-crooked.com/a/4d0329570b3059f9. However I'd recommend a benchmark tool, like [using nonius](http://coliru.stacked-crooked.com/a/af4788cde81cde34), and you get much more precise results: http://stackoverflow-sehe.s3.amazonaws.com/1681c489-79c1-473b-b146-687ec580e86f/stats.html — sehe, Oct 19 '22 at 12:51
Oh look, after several tries, QuickBench managed to run it too: https://quick-bench.com/q/gPcarrLlJOFyu9wvk50xkXJW-HM — sehe, Oct 19 '22 at 13:26
Out of curiousity, this neatly demonstrates how it is a trade-off of compile-time vs runtime performance! The faster runtime takes a little bit longer to compile (https://build-bench.com/b/9y6HNBqTDnoMSM3bTmvtGLiv0Ok). And it's a true testament to C++ as a language, that we can have that without needing to write more low-level/complex code — sehe, Oct 19 '22 at 13:59
Thanks for the solution. But I find that the you code solution uses lot of c++20 features like std::contiguous_iterator ... Can you pls remove those ? I tried doing it but finding it very hard..I am not able to compile it locally..Please — Sebastian Orteho, Nov 16 '22 at 19:47
Which one, @SebastianOrteho? Here's the answer code in c++11: http://coliru.stacked-crooked.com/a/0c2a366474f14444 - note that we [can't check contiguous iterators](https://stackoverflow.com/a/42855677/85371), so you I'll have to make sure of that yourself (or limit the interface to raw pointers, e.g., or use a span implementation from Boost or GSL etc.) — sehe, Nov 16 '22 at 23:45

How can I faster encoding randomclass?

1 Answers1

Predicting Sizes: LEB128

Predicting Content Length

Live Demo

Linked