Standard compliant host to network endianess conversion

Question

I am amazed at how many topics on StackOverflow deal with finding out the endianess of the system and converting endianess. I am even more amazed that there are hundreds of different answers to these two questions. All proposed solutions that I have seen so far are based on undefined behaviour, non-standard compiler extensions or OS-specific header files. In my opinion, this question is only a duplicate if an existing answer gives a standard-compliant, efficient (e.g., use x86-bswap), compile time-enabled solution.

Surely there must be a standard-compliant solution available that I am unable to find in the huge mess of old "hacky" ones. It is also somewhat strange that the standard library does not include such a function. Perhaps the attitude towards such issues is changing, since C++20 introduced a way to detect endianess into the standard (via std::endian), and C++23 will probably include std::byteswap, which flips endianess.

In any case, my questions are these:

Starting at what C++ standard is there a portable standard-compliant way of performing host to network byte order conversion?
I argue below that it's possible in C++20. Is my code correct and can it be improved?
Should such a pure-c++ solution be preferred to OS specific functions such as, e.g., POSIX-htonl? (I think yes)

I think I can give a C++23 solution that is OS-independent, efficient (no system call, uses x86-bswap) and portable to little-endian and big-endian systems (but not portable to mixed-endian systems):

// requires C++23. see https://gcc.godbolt.org/z/6or1sEvKn
#include <type_traits>
#include <utility>
#include <bit>

constexpr inline auto host_to_net(std::integral auto i) {
    static_assert(std::endian::native == std::endian::big || std::endian::native == std::endian::little);
    if constexpr (std::endian::native == std::endian::big) {
        return i;
    } else {
        return std::byteswap(i);
    }
}

Since std::endian is available in C++20, one can give a C++20 solution for host_to_net by implementing byteswap manually. A solution is described here, quote:

// requires C++17
#include <climits>
#include <cstdint>
#include <type_traits>

template<class T, std::size_t... N>
constexpr T bswap_impl(T i, std::index_sequence<N...>) {
  return ((((i >> (N * CHAR_BIT)) & (T)(unsigned char)(-1)) <<
           ((sizeof(T) - 1 - N) * CHAR_BIT)) | ...);
}; //                                        ^~~~~ fold expression
template<class T, class U = typename std::make_unsigned<T>::type>
constexpr U bswap(T i) {
  return bswap_impl<U>(i, std::make_index_sequence<sizeof(T)>{});
}

The linked answer also provides a C++11 byteswap, but that one seems to be less efficient (not compiled to x86-bswap). I think there should be an efficient C++11 way of doing this, too (using either less template-nonsense or even more) but I don't care about older C++ and didn't really try.

Assuming I am correct, the remaining question is: can one can determine system endianess before C++20 at compile time in a standard-compliant and compiler-agnostic way? None of the answers here seem to do achieve this. They use reinterpret_cast (not compile time), OS-headers, union aliasing (which I believe is UB in C++), etc. Also, for some reason, they try to do it "at runtime" although a compiled executable will always run under the same endianess.)

One could do it outside of constexpr context and hope it's optimized away. On the other hand, one could use system-defined preprocessor definitions and account for all platforms, as seems to be the approach taken by Boost. Or maybe (although I would guess the other way is better?) use macros and pick platform-specific htnl-style functions from networking libraries(done, e.g., here (GitHub))?

C++ doesn't define network byte order, so how can it define conversion *to* network byte order? Not sure what you're asking about here. — littleadv, Feb 06 '22 at 02:49
Generally you should just write endian-agnostic code that operates in terms of most-/least-significant-bytes and not big-endian vs. little-endian and byte-swapping, and that would guarantee portability. — jamesdlin, Feb 06 '22 at 03:35
About efficiency: What are the odds that the byte swapping is the bottleneck, and not the network transfer? — BoP, Feb 06 '22 at 03:51
@JohnFilleau I have three so-so answers in my last paragraph, not "the answer". Although this opinion is apparently heretical in a C++ context, I believe that ideally "there should be exactly one correct way of solving common problems". The problem is ubiquitous, as shown by all of the other questions on this topic. Wouldn't it be nice to have a modern pure-C++ solution? — Adomas Baliuka, Feb 06 '22 at 03:52
@littleadv C++23 will standardize `std::byteswap`. `C++20` standardized endianness. This means that one has a standard compliant conversion to network byte order, does it not? — Adomas Baliuka, Feb 06 '22 at 03:55
@AdomasBaliuka you're missing the point that "network byte order" is not defined by C++. It's defined by whatever network protocol you're using. — littleadv, Feb 06 '22 at 05:35
@littleadv Network byte order is well-defined in networking as big-endian. It doesn't change with different network protocols. But that doesn't mean a network protocol *must* use "network byte order". They can send the data in any order they like. — Galik, Feb 06 '22 at 08:29

eerorika · Accepted Answer · 2022-02-06T05:48:08.643

compile time-enabled solution.

Consider whether this is useful requirement in the first place. The program isn't going to be communicating with another system at compile time. What is the case where you would need to use the serialised integer in a compile time constant context?

Starting at what C++ standard is there a portable standard-compliant way of performing host to network byte order conversion?

It's possible to write such function in standard C++ since C++98. That said, later standards bring tasty template goodies that make this nicer.

There isn't such function in the standard library as of the latest standard.

Should such a pure-c++ solution be preferred to OS specific functions such as, e.g., POSIX-htonl? (I think yes)

Advantage of POSIX is that it's less important to write tests to make sure that it works correctly.

Advantage of pure C++ function is that you don't need platform specific alternatives to those that don't conform to POSIX.

Also, the POSIX htonX are only for 16 bit and 32 bit integers. You could instead use htobeXX functions instead that are in some *BSD and in Linux (glibc).

Here is what I have been using since C+17. Some notes beforehand:

Since endianness conversion is always¹ for purposes of serialisation, I write the result directly into a buffer. When converting to host endianness, I read from a buffer.
I don't use CHAR_BIT because network doesn't know my byte size anyway. Network byte is an octet, and if your CPU is different, then these functions won't work. Correct handling of non-octet byte is possible but unnecessary work unless you need to support network communication on such system. Adding an assert might be a good idea.
I prefer to call it big endian rather than "network" endian. There's a chance that a reader isn't aware of the convention that de-facto endianness of network is big.
Instead of checking "if native endianness is X, do Y else do Z", I prefer to write a function that works with all native endianness. This can be done with bit shifts.

Yeah, it's constexpr. Not because it needs to be, but just because it can be. I haven't been able to produce an example where dropping constexpr would produce worse code.

// helper to promote an integer type
template <class T>
using promote_t = std::decay_t<decltype(+std::declval<T>())>;

template <class T, std::size_t... I>
constexpr void
host_to_big_impl(
    unsigned char* buf,
    T t,
    [[maybe_unused]] std::index_sequence<I...>) noexcept
{
    using U = std::make_unsigned_t<promote_t<T>>;
    constexpr U lastI = sizeof(T) - 1u;
    constexpr U bits = 8u;
    U u = t;
    ( (buf[I] = u >> ((lastI - I) * bits)), ... );
}


template <class T, std::size_t... I>
constexpr void
host_to_big(unsigned char* buf, T t) noexcept
{
    using Indices = std::make_index_sequence<sizeof(T)>;
    return host_to_big_impl<T>(buf, t, Indices{});
}

_{¹ In all use cases I've encountered. Conversions from integer to integer can be implemented by delegating these if you have such case, although they cannot be constexpr due to need for reinterpret_cast.}

"Consider whether this is useful in the first place." Don't you want to know this so you can potentially in-line the host-to-network and network-to-host functions (which will be either nop or byteswap? — Ben, Feb 06 '22 at 03:52
@Ben Optimisers can do inlining and constant propagation even without constexpr. And yes, they can tranform loops into byteswap. I wouldn't worry about it until you can demonstrate that the optimiser fails to use nop/byteswap. (even then, it may be premature to worry). — eerorika, Feb 06 '22 at 03:56
Fair enough. To be clear, you are saying so long as it's all inlinable, `if (!is_network_order()) { byteswap(x); }` will get optimized to either nop or `byteswap(x)`, right? — Ben, Feb 06 '22 at 04:02
@eerorika this is a ubiquitous problem. I don't think I need to prove that it's "the bottleneck" in my program to just want an optimal solution that I can just use without worrying if it's inlined, optimized or whatever. You seem to be objecting to my question because "it's trivial and who worries about it anyway". I agree that it should be that way. However, as it is, lot's of people ask questions on the topic and we have over 100 different answers total, some of which are based on UB. I see people use UB in their networking "production code" all the time. Is that good? — Adomas Baliuka, Feb 06 '22 at 04:05
@AdomasBaliuka `You seem to be objecting to my question because "it's trivial and who worries about it anyway".` If I were objecting your question, I would not have answered. My point is that unless you can demonstrate that you need to do serialisation at compile time, constexprness has no effect on the "optimality" of the solution. The standard doesn't guarantee that a constexpr function will be inlined any more than a non-constexpr function would be, unless you use it in a compile time constant context. — eerorika, Feb 06 '22 at 04:56
@AdomasBaliuka `over 100 different answers total, some of which are based on UB. I see people use UB in their networking "production code" all the time. Is that good?` I've added the 101th solution now. I don't think it solves the problem of 100 other answers existing, but you're free to use this one. — eerorika, Feb 06 '22 at 04:57

Adomas Baliuka · Answer 2 · 2022-03-03T15:19:44.130

I made a benchmark comparing my C++ solution from the question and the solution by eeroika from the accepted answer.

Looking at this is a complete waste of time, but now that I did it, I though I might as well share it. The result is that (in the specific not-quite-realistic usecase I look at) they seem to be equivalent in terms of performance. This is despite my solution being compiled to use x86-bswap, while the solution by eeroika does it by just using mov.

The performance seems to differ a lot (!!) when using different compilers and the main thing I learned from these benchmarks is, again, that I'm just wasting my time...

// benchmark to compare two C++20-stand-alone host-to-big-endian endianess conversion.]
// Run at quick-bench.com! This is not a complete program. (https://quick-bench.com/q/2qnr4xYKemKLZupsicVFV_09rEk)
// To run locally, include Google benchmark header and a main method as required by the benchmarking library.
// Adapted from https://stackoverflow.com/a/71004000/9988487
#include <type_traits>
#include <utility>
#include <cstddef>
#include <cstdint>
#include <climits>
#include <type_traits>
#include <utility>
#include <bit>
#include <random>

/////////////////////////////// Solution 1 ////////////////////////////////

template <typename T> struct scalar_t { T t{}; /* no begin/end */ };
static_assert(not std::ranges::range< scalar_t<int> >);

template<class T, std::size_t... N>
constexpr T bswap_impl(T i, std::index_sequence<N...>) noexcept {
  constexpr auto bits_per_byte = 8u;
  static_assert(bits_per_byte == CHAR_BIT);
  return ((((i >> (N * bits_per_byte)) & (T)(unsigned char)(-1)) <<
           ((sizeof(T) - 1 - N) * bits_per_byte)) | ...);
}; //                                             ^~~~~ fold expression

template<class T, class U = typename std::make_unsigned<T>::type>
constexpr U bswap(T i) noexcept {
  return bswap_impl<U>(i, std::make_index_sequence<sizeof(T)>{});
}

constexpr inline auto host_to_net(std::integral auto i) {
    static_assert(std::endian::native == std::endian::big || std::endian::native == std::endian::little);
    if constexpr (std::endian::native == std::endian::big) {
        return i;
    } else {
        return bswap(i);  // replace by `std::byteswap` once it's available!
    }
}

/////////////////////////////// Solution 2 ////////////////////////////////

// helper to promote an integer type
template <class T>
using promote_t = std::decay_t<decltype(+std::declval<T>())>;

template <class T, std::size_t... I>
constexpr void
host_to_big_impl(
    unsigned char* buf,
    T t,
    [[maybe_unused]] std::index_sequence<I...>) noexcept {
    using U = std::make_unsigned_t<promote_t<T>>;
    constexpr U lastI = sizeof(T) - 1u;
    constexpr U bits = 8u;
    U u = t;
    ( (buf[I] = u >> ((lastI - I) * bits)), ... );
}


template <class T, std::size_t... I>
constexpr void
host_to_big(unsigned char* buf, T t) noexcept {
    using Indices = std::make_index_sequence<sizeof(T)>;
    return host_to_big_impl<T>(buf, t, Indices{});
}

//////////////////////// Benchmarks ////////////////////////////////////

template<std::integral T>
std::vector<T> get_random_vector(std::size_t length, unsigned int seed) {
    // NOTE: IT IS VERY SLOW TO RECREATE RNG EVERY TIME. Don't use in production code!
    std::mt19937_64 rng{seed};
    std::uniform_int_distribution<T> distribution(
        std::numeric_limits<T>::min(), std::numeric_limits<T>::max());

    std::vector<T> result(length);
    for (auto && val : result) {
        val = distribution(rng);
    }
    return result;
}

template<>
std::vector<bool> get_random_vector<bool>(std::size_t length, unsigned int seed) {
    // NOTE: IT IS VERY SLOW TO RECREATE RNG EVERY TIME. ONLY USE FOR TESTING!
    std::mt19937_64 rng{seed};
    std::bernoulli_distribution distribution{0.5};

    std::vector<bool> vec(length);

    for (auto && val : vec) {
        val = distribution(rng);
    }
    return vec;
}

constexpr std::size_t n_ints{1000};


static void solution1(benchmark::State& state) {
  std::vector<int> intvec = get_random_vector<int>(n_ints, 0);
  std::vector<std::uint8_t> buffer(sizeof(int)*intvec.size());

  for (auto _ : state) {
    for (std::size_t i{}; i < intvec.size(); ++i) {
        host_to_big(buffer.data() + sizeof(int)*i, intvec[i]);
    }
    
    benchmark::DoNotOptimize(buffer);
    benchmark::ClobberMemory();
  }
}
BENCHMARK(solution1);


static void solution2(benchmark::State& state) {
  std::vector<int> intvec = get_random_vector<int>(n_ints, 0);
  std::vector<std::uint8_t> buffer(sizeof(int)*intvec.size());

  for (auto _ : state) {
    for (std::size_t i{}; i < intvec.size(); ++i) {
        buffer[sizeof(int)*i] = host_to_net(intvec[i]);
    }
    
    benchmark::DoNotOptimize(buffer);
    benchmark::ClobberMemory();
  }
}
BENCHMARK(solution2);

Standard compliant host to network endianess conversion

2 Answers2