31

A common question that comes up from time to time in the world of C++ programming is compile-time determination of endianness. Usually this is done with barely portable #ifdefs. But does the C++11 constexpr keyword along with template specialization offer us a better solution to this?

Would it be legal C++11 to do something like:

constexpr bool little_endian()
{
   const static unsigned num = 0xAABBCCDD;
   return reinterpret_cast<const unsigned char*> (&num)[0] == 0xDD;
}

And then specialize a template for both endian types:

template <bool LittleEndian>
struct Foo 
{
  // .... specialization for little endian
};

template <>
struct Foo<false>
{
  // .... specialization for big endian
};

And then do:

Foo<little_endian()>::do_something();
Xeo
  • 129,499
  • 52
  • 291
  • 397
Charles Salvia
  • 52,325
  • 13
  • 128
  • 140

9 Answers9

19

New answer (C++20)

has introduced a new standard library header <bit>. Among other things it provides a clean, portable way to check the endianness.

Since my old method relies on some questionable techniques, I suggest anyone who uses it to switch to the check provided by the standard library.

Here's an adapter which allows to use the new way of checking endianness without having to update the code that relies on the interface of my old class:

#include <bit>

class Endian
{
public:
    Endian() = delete;

    static constexpr bool little = std::endian::native == std::endian::little;
    static constexpr bool big = std::endian::native == std::endian::big;
    static constexpr bool middle = !little && !big;
};

Old answer

I was able to write this:

#include <cstdint>

class Endian
{
private:
    static constexpr uint32_t uint32_ = 0x01020304;
    static constexpr uint8_t magic_ = (const uint8_t&)uint32_;
public:
    static constexpr bool little = magic_ == 0x04;
    static constexpr bool middle = magic_ == 0x02;
    static constexpr bool big = magic_ == 0x01;
    static_assert(little || middle || big, "Cannot determine endianness!");
private:
    Endian() = delete;
};

I've tested it with g++ and it compiles without warnings. It gives a correct result on x64. If you have any big-endian or middle-endian proccesor, please, confirm that this works for you in a comment.

Piotr Siupa
  • 3,929
  • 2
  • 29
  • 65
  • 1
    what is `const uint8_t &` – Nick Aug 14 '17 at 14:16
  • @Nick It's a reference to constant 8-bit unsigned integer. – Piotr Siupa Aug 15 '17 at 15:55
  • 2
    i understand, but what is the benefit of such cast? why not just uint8_t without const and ref? – Nick Aug 15 '17 at 22:23
  • I not sure if it would work without casting to reference. AFAIK this cast explicitly tells compiler that it should read from the same address of memory where the first variable is. `(uint8_t&)var` is an equivalent to `*(unit8_t*)&var`. Word `const` is here because `constexpr` variable cannot be casted to non-const. – Piotr Siupa Aug 16 '17 at 00:15
  • yes, this is C trick. they want to get "leading" byte. very clever. – Nick Aug 16 '17 at 09:05
  • @Nick @NO_NAME I think going with `magic_ = (uint8_t)uint32_` yields undefined behavior as you don't know if it truncates MSB or LSB (at least in C: https://stackoverflow.com/a/34886065/4248972). My intuition is that the reference-cast resolves that issue. Can someone please confirm? – pasbi Aug 12 '18 at 17:15
  • 1
    I'm surprised this is allowed. Looks like some kind of omission in the standard. It should not be permitted as it precludes a large class of implementations. – n. m. could be an AI Sep 03 '18 at 05:42
  • 2
    This doesn't work. The cast to `const uint8_t&` creates a temporary by truncating the value of `uint32_` to 8 bits. This will always claim every target is little-endian. (Though that may be accurate enough in practice these days!) – Richard Smith Jul 31 '19 at 02:00
14

It is not possible to determine endianness at compile time using constexpr (before C++20). reinterpret_cast is explicitly forbidden by [expr.const]p2, as is iain's suggestion of reading from a non-active member of a union. Casting to a different reference type is also forbidden, as such a cast is interpreted as a reinterpret_cast.

Update:

This is now possible in C++20. One way (live):

#include <bit>
template<std::integral T>
constexpr bool is_little_endian() {
  for (unsigned bit = 0; bit != sizeof(T) * CHAR_BIT; ++bit) {
    unsigned char data[sizeof(T)] = {};
    // In little-endian, bit i of the raw bytes ...
    data[bit / CHAR_BIT] = 1 << (bit % CHAR_BIT);
    // ... corresponds to bit i of the value.
    if (std::bit_cast<T>(data) != T(1) << bit)
      return false;
  }
  return true;
}
static_assert(is_little_endian<int>());

(Note that C++20 guarantees two's complement integers -- with an unspecified bit order -- so we just need to check that every bit of the data maps to the expected place in the integer.)

But if you have a C++20 standard library, you can also just ask it:

#include <type_traits>
constexpr bool is_little_endian = std::endian::native == std::endian::little;
Richard Smith
  • 13,696
  • 56
  • 78
12

Assuming N2116 is the wording that gets incorporated, then your example is ill-formed (notice that there is no concept of "legal/illegal" in C++). The proposed text for [decl.constexpr]/3 says

  • its function-body shall be a compound-statement of the form { return expression; } where expression is a potential constant expression (5.19);

Your function violates the requirement in that it also declares a local variable.

Edit: This restriction could be overcome by moving num outside of the function. The function still wouldn't be well-formed, then, because expression needs to be a potential constant expression, which is defined as

An expression is a potential constant expression if it is a constant expression when all occurrences of function parameters are replaced by arbitrary constant expressions of the appropriate type.

IOW, reinterpret_cast<const unsigned char*> (&num)[0] == 0xDD would have to be a constant expression. However, it is not: &num would be a address constant-expression (5.19/4). Accessing the value of such a pointer is, however, not allowed for a constant expression:

The subscripting operator [] and the class member access . and operators, the & and * unary operators, and pointer casts (except dynamic_casts, 5.2.7) can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators.

Edit: The above text is from C++98. Apparently, C++0x is more permissive what it allows for constant expressions. The expression involves an lvalue-to-rvalue conversion of the array reference, which is banned from constant expressions unless

it is applied to an lvalue of effective integral type that refers to a non-volatile const variable or static data member initialized with constant expressions

It's not clear to me whether (&num)[0] "refers to" a const variable, or whether only a literal num "refers to" such a variable. If (&num)[0] refers to that variable, it is then unclear whether reinterpret_cast<const unsigned char*> (&num)[0] still "refers to" num.

Martin v. Löwis
  • 124,830
  • 17
  • 198
  • 235
  • I don't feel it applies, here. The static variable is constant itself. – GManNickG Oct 18 '09 at 05:30
  • The wording in 4.1 of N2116 states that the body of the function must only have one statement (that being the return statement). Mind you, from my quick glance over the text, I don't see anything prohibiting the above code if num is defined globally. – GRB Oct 18 '09 at 05:49
  • @GMan: as GRB says, the draft is fairly clear that ou must have only one statement, and a declaration *is* a statement (C++98, 6.7, Declaration statement). @GRB: I'll edit my response to discuss moving the constant outside of the function. – Martin v. Löwis Oct 18 '09 at 06:11
  • +1, thanks for clearing that up Martin. While I did suggest moving the variable as a possibility, the idea that `(&num)[0]` would pass as 'constant' didn't sit well with me. That said, I've always been meaning to do some more reading as to what will and will not be allowed in `constexpr` functions, which hopefully I'll get around to soon ;) – GRB Oct 18 '09 at 06:33
  • 2
    The last quoted paragraph does not seem to be part of the latest c++0x draft (n2960). The draft says that `&num` is a constant expression if `num` is not a variable or data-member of thread or automatic storage duration (read: if `num` is a local static or namespace scope variable without the "thread_local" specifier, then `&num` is a constant expression). However the `reinterpret_cast` makes it a non-constant expression, because it constitutes a conversion of pointer type to a literal type (notice that pointer types are itself literal types). – Johannes Schaub - litb Oct 18 '09 at 15:52
  • It indeed looks like `(&num)[0]` is a constant expression in C++0x. I would be glad if you find wording in n2960 that states otherwise. – Johannes Schaub - litb Oct 18 '09 at 15:54
  • Ah. I think there is nothing wrong with (&num)[0]. You can only have `0` as index anyway, and then it's equivalent to `num` here, which is easy for the compiler to see, i think. @GRB, notice though that he is not doing `(&num)[0]` in this sense, but he's doing `reinterpret_cast<...>(...)[N]`. So he indexes the result of the `reinterpret_cast`, not the result of `(&num)` directly. Haven't noticed that you were talking about his code when showing `(&num)[0]` first :) In any case, surely because he has > 1 statement in the function, and because of the `reinterpret_cast`, his code is illformed :) – Johannes Schaub - litb Oct 18 '09 at 16:17
  • yeah sorry about the confusion litb, I meant for `(&num)[0]` to specifically refer to the cast in his code (I was too lazy to type out the entire `reinterpret_cast`... lol, sorry ;D) – GRB Oct 18 '09 at 17:05
  • @litb: the text banning derefencing pointers is indeed from c++98. I see in C++0x this whole text has been rephrased. As for `reinterpret_cast`: I cannot see where it is banned from a constant expression in C++0x, so I then think it should be well-formed. However, since it dereferences the wrong pointer type, it has undefined behavior (which, in turn, is the whole point of the endianness test). – Martin v. Löwis Oct 18 '09 at 19:51
  • The matter with `reinterpret_cast` is not a constant expression because of the point where it says "a type conversion from a pointer or pointer-to-member type to a literal type" in `5.19/2`. The dereference is not undefined behavior, because you are allowed to read the underlying bytes of any trivially copyable type by using `char*` or `unsigned char*` (see `3.9/2`). – Johannes Schaub - litb Oct 18 '09 at 21:19
  • I agree with you it could be clearer about the term "variable" though. The term is defined in `3/6` as "A variable is introduced by the declaration of an object. The variable's name denotes the object.". So, i think it wants to say that a "variable" is a translation time entity that has a name which denotes an object. So the following yields an integral constant expression: `int const c = 0; constexpr int f() { return n; }`, but the following not, because the lvalue refers to an object, but not to a variable (the wording is surely confusing here and slighly backwards imho): `return (&n)[0];` – Johannes Schaub - litb Oct 18 '09 at 21:29
  • Notice that i think `(&n)[0]` is itself a constant expression. So you can do `int &p = (&n)[0];` and `p` is going to be constant-initialized. But the lvalue-to-rvalue conversion on `(&n)[0]` that happens in the example f function above (when returning the `int` value) would not be allowed in a constant expression. So the following is not constant-initialized i think: `int const a = +(&n)[0];` (according to the rule in `3.6.2/2`), and so its initialization time compared to another such object in another translation unit is unspecified. – Johannes Schaub - litb Oct 18 '09 at 21:36
  • So it seems the consensus here is that the little_endian() function is definitely malformed, because it consists of more than one statement. A solution would be to move the declaration outside of the function, but even then, it is questionable at this time whether a reinterpret_cast is allowed in a constant expression. – Charles Salvia Oct 19 '09 at 02:55
  • 2
    No, that is not questionable. It's certain that it's not allowed. The wording is clear. – Johannes Schaub - litb Oct 19 '09 at 02:59
  • @litb: I disagree that the `reinterpret_cast` is disallowed. 3.9/2 defines a literal type as either a scalar type, a class type with only literal members, or a literal array; 15.9/2 only bans conversions into such type. The proposed function converts one pointer type to another pointer type; such conversion is not banned. – Martin v. Löwis Oct 19 '09 at 04:06
  • 1
    And pointer types are scalar types. :) BTW i think you are being confused by the above `(&num)[0]` too: In the code, he never does `(&num)[0]`. He is doing `(reinterpret_cast<...>(&num))[0]`. So you have to first consider the reinterpret_cast, and then it is `result_of_reinterpret_cast[0]`. Your last paragraph indicates that you get the binding of it wrong, which is quite confusing to readers. – Johannes Schaub - litb Oct 19 '09 at 13:42
  • @Martinv.Löwis I think you must be referring to pre-standard wording. 5.19/2 explicitly says `reinterpret_cast` is not permitted in a core constant expression (and hence not in a constant expression). – Richard Smith Jan 02 '12 at 19:42
7

There is std::endian in the upcoming C++20.

#include <bit>

constexpr bool little_endian() noexcept
{
    return std::endian::native == std::endian::little;
}
magras
  • 1,709
  • 21
  • 32
5

My first post. Just wanted to share some code that I'm using.

//Some handy defines magic, thanks overflow
#define IS_LITTLE_ENDIAN  ('ABCD'==0x41424344UL) //41 42 43 44 = 'ABCD' hex ASCII code
#define IS_BIG_ENDIAN     ('ABCD'==0x44434241UL) //44 43 42 41 = 'DCBA' hex ASCII code
#define IS_UNKNOWN_ENDIAN (IS_LITTLE_ENDIAN == IS_BIG_ENDIAN)

//Next in code...
struct Quad
{
    union
    {
#if IS_LITTLE_ENDIAN
        struct { std::uint8_t b0, b1, b2, b3; };

#elif IS_BIG_ENDIAN
        struct { std::uint8_t b3, b2, b1, b0; };

#elif IS_UNKNOWN_ENDIAN
#error "Endianness not implemented!"
#endif

        std::uint32_t dword;
    };
};

Constexpr version:

namespace Endian
{
    namespace Impl //Private
    {
        //41 42 43 44 = 'ABCD' hex ASCII code
        static constexpr std::uint32_t LITTLE_{ 0x41424344u };

        //44 43 42 41 = 'DCBA' hex ASCII code
        static constexpr std::uint32_t BIG_{ 0x44434241u };

        //Converts chars to uint32 on current platform
        static constexpr std::uint32_t NATIVE_{ 'ABCD' };
    }



    //Public
    enum class Type : size_t { UNKNOWN, LITTLE, BIG };

    //Compare
    static constexpr bool IS_LITTLE   = Impl::NATIVE_ == Impl::LITTLE_;
    static constexpr bool IS_BIG      = Impl::NATIVE_ == Impl::BIG_;
    static constexpr bool IS_UNKNOWN  = IS_LITTLE == IS_BIG;

    //Endian type on current platform
    static constexpr Type NATIVE_TYPE = IS_LITTLE ? Type::LITTLE : IS_BIG ? Type::BIG : Type::UNKNOWN;



    //Uncomment for test. 
    //static_assert(!IS_LITTLE, "This platform has little endian.");
    //static_assert(!IS_BIG, "This platform has big endian.");
    //static_assert(!IS_UNKNOWN, "Error: Unsupported endian!");
}
gpdaniels
  • 453
  • 1
  • 8
  • 26
Andrew F.
  • 51
  • 1
  • 2
2

That is a very interesting question.

I am not Language Lawyer, but you might be able to replace the reinterpret_cast with a union.

const union {
    int int_value;
    char char_value[4];
} Endian = { 0xAABBCCDD };

constexpr bool little_endian()
{
   return Endian[0] == 0xDD;
}
iain
  • 10,798
  • 3
  • 37
  • 41
  • 1
    Placing a value in a union then accessing the union via another member is not valid. – GManNickG Oct 18 '09 at 20:41
  • 11
    @GMan: It is well-formed, but invokes undefined behavior. "valid" is not a property defined in the C++ standard. – Martin v. Löwis Oct 18 '09 at 21:02
  • Yea, threw my own terminology in there. Thanks for pointing out the correct terms. – GManNickG Oct 19 '09 at 08:45
  • 3
    @Martin: Exactly what § of the standard says it invokes undefined behaviour? A char lvalue may certainly alias (part of) an int object. Also, all possible bit patterns represent valid char and unsigned char values as far as I can tell. This leads me to believe this is just invokes implementation-defined behaviour and not UB. – sellibitze Oct 19 '09 at 17:08
  • 1
    @sellibitze: aliasing pointers with `char*` would be fine, but not via a union. – Mooing Duck May 21 '12 at 19:11
  • 2
    @Martinv.Löwis clang gives an error with a note that reading a non-active member in a union is not allowed at all in a constant expression. Normally it's undefined behavior, but it looks like it's ill-formed in a constant expression. – bames53 Aug 21 '12 at 18:18
  • I agree with all the comments that this is not good c++ and invokes undefined behavior according to the c++ spec. I would not recommend any code to rely on this now. Although I used it to great affect on Windows in the 90's to create a quick hash of a string based on its size (as a short) and first two characters. – iain Oct 31 '12 at 08:36
  • Nice, but does not compile - [x86-64 clang 5.0.0 #1] note: read of member 'char_value' of union with active member 'int_value' is not allowed in a constant expression – Nick Aug 13 '18 at 13:05
  • 1
    @Nick, I'm not surprised. Tricks like this used to be very common on x86 20 odd years ago (esp on Windows) porting that code was a nightmare. It's good that this no longer compiles. I haven't used Cpp in almost 10 years, but this used to compile on visual studio around that time, again not the most standards compliant compiler then (but neither was my other compiler on Solaris). – iain Aug 14 '18 at 14:14
  • This can't compile as constexpr: https://godbolt.org/z/qPTT5P (note other issues has been fixed, but UB is detected and reported). – Marek R Nov 26 '20 at 17:26
1

This may seem like cheating, but you can always include endian.h... BYTE_ORDER == BIG_ENDIAN is a valid constexpr...

Christopher Smith
  • 5,372
  • 1
  • 34
  • 18
  • 1
    Not all systems have endian.h, also MacOS and BSD endian.h emits ton of warnings. – Nick Aug 13 '18 at 13:07
1

Here is a simple C++11 compliant version, inspired by @no-name answer:

constexpr bool is_system_little_endian(int value = 1) {
    return static_cast<const unsigned char&>(value) == 1;
}

Using a default value to crank everything on one line is to meet C++11 requirements on constexpr functions: they must only contain a single return statement.

The good thing with doing it (and testing it!) in a constexpr context is that it makes sure that there is no undefined behavior in the code.

On compiler explorer here.

-4

If your goal is to insure that the compiler optimizes little_endian() into a constant true or false at compile-time, without any of its contents winding up in the executable or being executed at runtime, and only generating code from the "correct" one of your two Foo templates, I fear you're in for a disappointment.

I also am not a language lawyer, but it looks to me like constexpr is like inline or register: a keyword that alerts the compiler writer to the presence of a potential optimization. Then it's up to the compiler writer whether or not to take advantage of that. Language specs typically mandate behaviors, not optimizations.

Also, have you actually tried this on a variety of C++0x complaint compilers to see what happens? I would guess most of them would choke on your dual templates, since they won't be able to figure out which one to use if invoked with false.

Bob Murphy
  • 5,814
  • 2
  • 32
  • 35
  • 1
    It's not quite the same. The result of a 'constexpr' function generally can be used where a constant expression is required, eg. an array bounds. Although I believe there is some leeway in the case of function templates. – Richard Corden Oct 19 '09 at 18:11