1

I have a simple program that does some precalculations and serializes struct into binary file which is then loaded by another program, while this works on my machine, could it cause problems if I rely on sizeof() function? I know that variable sizes may varry between some processors which is why I'm thinking about hardcoding them since files to load are created on my computer only.

Werem
  • 21
  • 2

1 Answers1

1

When we want to transmit data across a network, we ought to have a few goals in mind:

  1. portability - ability to read and write the data on any host, in any language, with any processor architecture.

  2. speed - Network transmission is several orders of magnitude slower than memory access. The less data we send and receive, the more responsive our applications will be.

One popular way to achieve this with signed integers (there are other techniques for other types) is called zig-zag encoding.

It is used in google's protocol buffers and many other data transmission schemas.

Zig-zag encoding has the advantage that the number of bytes transmitted is dictated by the magnitude of the number you are transmitting, not the number of binary bits. Most numbers are small. Therefore it makes no sense to transmit all the leading 1s of a small negative number. They can be implied.

Here is an implementation of zig-zag encoding that works for ints of 16, 32, and 64 bits.

Extend at your leisure.

Note that unsigned integers don't need to be zig-zag encoded, and strings are easy - encode the length using a variable-length integer followed by N bytes of string data.

#include <cstddef>
#include <cstdint>
#include <cassert>
#include <limits>
#include <memory>
#include <cstring>

namespace notstd {
    using byte = std::uint8_t;
}

template<class SignedInt> struct unsigned_version;
template<> struct unsigned_version<std::int16_t> { using type = std::uint16_t; };
template<> struct unsigned_version<std::int32_t> { using type = std::uint32_t; };
template<> struct unsigned_version<std::int64_t> { using type = std::uint64_t; };
template<class SignedInt> using unsigned_version_t = typename unsigned_version<SignedInt>::type;

template<class UnSignedInt> struct signed_version;
template<> struct signed_version<std::uint16_t> { using type = std::int16_t; };
template<> struct signed_version<std::uint32_t> { using type = std::int32_t; };
template<> struct signed_version<std::uint64_t> { using type = std::int64_t; };
template<class UnSignedInt> using signed_version_t = typename signed_version<UnSignedInt>::type;

template<class SignedInt>
auto zig_zag(SignedInt input) -> unsigned_version_t<SignedInt>
{
    using word_type = unsigned_version_t<SignedInt>;
    constexpr auto bit_count = std::numeric_limits<word_type>::digits;
    auto result = word_type((input << 1) ^ (input >> (bit_count - 1)));
    return result;
}

template<class UnsignedInt>
auto unzig_zag(UnsignedInt input) -> signed_version_t<UnsignedInt>
{
    auto negative = input & 1;
    auto accum = (input >> 1);
    if (negative)
        accum = ~accum;
    auto result = signed_version_t<UnsignedInt>();
    std::memcpy(std::addressof(result), std::addressof(accum), sizeof(result));
    return result;
}

template<class SignedInt, class OutIter>
auto serialise(SignedInt input, OutIter iter) -> OutIter
{
    using notstd::byte;

    auto shifter = zig_zag(input);

    bool last_byte = false;
    do
    {
        if (shifter < 128)
            last_byte = true;
        auto val = byte(shifter & 0x7f);
        if (not last_byte) val |= byte(0x80);
        *iter++ = val;
        shifter >>= 7;
    } while (not last_byte);

    return iter;
}

template<class SignedInt, class InIter, class Sentinel>
auto deserialise(InIter& iter, Sentinel last) -> SignedInt
{
    using notstd::byte;

    using accum_type = unsigned_version_t<SignedInt>;
    auto accum = accum_type(0);
    int shift = 0;
    while (iter != last)
    {
        auto val = byte(*iter++);
        auto shifter = (accum_type(val) & 0x7f) << shift;
        accum |= shifter;
        if ((val & byte(0x80)) == byte(0))
        {
            break;
        }
        shift += 7;
    }
    return unzig_zag(accum);
}


#include <vector>
#include <iterator>


int main()
{
    using notstd::byte;
    auto buffer = std::vector<byte>();

    std::int32_t i = 16;
    auto iz = zig_zag(i);
    auto iuz = unzig_zag(iz);
    assert(i == iuz);

    i = -16;
    iz = zig_zag(i);
    iuz = unzig_zag(iz);
    assert(i == iuz);

    auto i1 = std::int16_t(3);
    auto i2 = std::int32_t(8736);
    auto i3 = std::int64_t(-7333738);

    auto iout = serialise(i1, back_inserter(buffer));
    iout = serialise(i2, iout);
    iout = serialise(i3, iout);


    auto iin = begin(buffer);
    auto o1 = deserialise<decltype(i1)>(iin, end(buffer));
    auto o2 = deserialise<decltype(i2)>(iin, end(buffer));
    auto o3 = deserialise<decltype(i3)>(iin, end(buffer));

    assert(i1 == o1);
    assert(i2 == o2);
    assert(i3 == o3);
    assert(iin == end(buffer));
}
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142