16

I had a discussion this morning with a colleague regarding the correctness of a "coding trick" to detect endianness.

The trick was:

bool is_big_endian()
{
  union
  {
    int i;
    char c[sizeof(int)];
  } foo;


  foo.i = 1;
  return (foo.c[0] == 1);
}

To me, it seems that this usage of an union is incorrect because setting one member of the union and reading another is not well-defined. But I have to admit that this is just a feeling and I lack actual proofs to strengthen my point.

Is this trick correct ? Who is right here ?

timrau
  • 22,578
  • 4
  • 51
  • 64
ereOn
  • 53,676
  • 39
  • 161
  • 238
  • 5
    At least the gcc folks recommend this for type punning instead of casting to another type, which is even more not well-defined :-) – Gunther Piez May 26 '11 at 09:04
  • @drhirsch: I agree. But if it is just a disguised cast, shoudln't it be just as bad ? – ereOn May 26 '11 at 09:07
  • 1
    It isn't just a disguised cast. It has more precise semantics about the memory locations of the elements of the union as a simple cast. Some more on that in http://gcc.gnu.org/onlinedocs/gcc-4.6.0/gcc/Optimize-Options.html#index-fstrict_002daliasing-824 – Gunther Piez May 26 '11 at 09:18
  • Indeed, you'll never get a fully-defined solution for nonsense like this :P – Lightness Races in Orbit May 26 '11 at 09:21
  • Isn't a `#define BIG_ENDIAN` much easier than this? You will still have to detect OS and other stuff on most systems anyway. – Bo Persson May 26 '11 at 11:42
  • @Bo Persson: Unfortunately, on some architectures (IBM) you have to detect endianness at runtime for it to work reliably. – ereOn May 27 '11 at 11:03

5 Answers5

13

Your code is not portable. It might work on some compilers or it might not.

You are right about the behaviour being undefined when you try to access the inactive member of the union [as it is in the case of the code given]

$9.5/1

In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time.

So foo.c[0] == 1 is incorrect because c is not active at that moment. Feel free to correct me if you think I am wrong.

Prasoon Saurav
  • 91,295
  • 49
  • 239
  • 345
  • 2
    I cannot find the quote in the C++0x standard, I found one that partially lifts this restriction for the case `struct A { int a; char b; };` `struct B { int a; double c; };` `union U { A a; B b; };` and says that accessing `u.a.a` or `u.b.a` is always possible (because it's a common prefix sequence of A and B) but nothing on undefinedness :/ – Matthieu M. May 26 '11 at 09:09
  • 6
    @Matthieu M. : This is what I think is relevant is this context $9.5 `"In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time."` Attempting to access the inactive member of a union leads to undefined behavior. – Prasoon Saurav May 26 '11 at 09:13
  • 1
    Non portable it is for sure. But that doesn't mean it is not well defined – Gunther Piez May 26 '11 at 09:19
  • @drhirsch : I think 'unspecified' qualifies as 'not well-defined'. – ildjarn May 26 '11 at 09:24
  • 5
    @drhirsch: There's well defined, and then there's well defined. As far as the standard is concerned, the behavior is undefined. Individual compilers may place additional guarantees, of course. – Dennis Zickefoose May 26 '11 at 09:31
  • Obviously, it's not well-defined by ISO C++. However, it's often well-defined in platform C++ ABIs, as they need to specify member layouts to achieve link-compatibility. IIRC, GCC uses the "Itanium" Intel C++ ABI which does allow this. – MSalters May 26 '11 at 09:35
  • 2
    The MISRA-C standard bans unions because of various portability issues, mainly concerning alignment and padding. They ban the use of "variants" for example. However, they make one exception to the rule and allows unions for data packing with one char array at the same size as the other data type(s) in the union. And this makes sense, because _data packing is likely the only sane use for the union keyword in the C/C++ language_. If this would be undefined behavior, that would effectively make unions useless. They should have been removed from the language if that is the case. – Lundin May 26 '11 at 09:40
  • @Lundin : They can still be very useful if you're tight on stack space (e.g. in embedded environments). – ildjarn May 26 '11 at 10:10
  • I am by no means an expert on MISRA-C, but I'm pretty sure they "forbid" Unions for safety reasons, not portability reasons. I'm not sure what you mean by your exception, though. How does creating a char array help with packing in the slightest? – Dennis Zickefoose May 26 '11 at 10:28
  • 1
    @ildjarn What you describe is also a major no-no in embedded systems, and also one of the reasons why MISRA bans unions. It is in the same category as "I'm low on memory so I'll use this tx buffer for something completely unrelated while I'm not sending anything". If you are low on stack space, optimize the code. If you still are low, increase the stack space. If you still are low, you picked the wrong MCU for your project. – Lundin May 26 '11 at 11:15
  • @Dennis Actually, I think they forbid them for both reasons. What MISRA means with data packing in this context, is that you have some sort of bus with a tx/rx register that is 1 byte large. If you wish to send an int over this bus, a good solution is to use a union for the purpose, to chop up the int into bytes without the need to copy or mask out anything. Similarly, data types describing hardware registers can be defined as unions, to make it possible to for example address the data by byte or by word. – Lundin May 26 '11 at 11:20
  • 1
    @Lundin: or you can use `boost::variant` which builds a type-safe union and knows, at any point in time, which of the data is active... and prevent you from accessing the others. If a logic condition of the program is that exactly one of N fields is active at any time, then a `boost::variant` is a *logical* expression of this property. – Matthieu M. May 26 '11 at 11:22
  • It's a pity that the standard does not except char arrays like it does for pointers (strict aliasing rules). Conversion with unions would be more readable than with casts. – Jan Hudec Nov 14 '11 at 10:16
7

Don't do this, better use something like the following:

#include <arpa/inet.h>
//#include <winsock2.h> // <-- for Windows use this instead

#include <stdint.h>

bool is_big_endian() {
  uint32_t i = 1;
  return i == htonl(i);
}

Explanation:

The htonl function converts a u_long from host to TCP/IP network byte order (which is big-endian).


References:

moooeeeep
  • 31,622
  • 22
  • 98
  • 187
2

You're correct that that code doesn't have well-defined behavior. Here's how to do it portably:

#include <cstring>

bool is_big_endian()
{
    static unsigned const i = 1u;
    char c[sizeof(unsigned)] = { };
    std::memcpy(c, &i, sizeof(c));
    return !c[0];
}

// or, alternatively

bool is_big_endian()
{
    static unsigned const i = 1u;
    return !*static_cast<char const*>(static_cast<void const*>(&i));
}
ildjarn
  • 62,044
  • 9
  • 127
  • 211
  • 1
    I think even this wouldn't be portable in ALL cases. It would return true on a median-ending architecture, which would end up with the 1 in the third byte. So why even bother with this, I would still use a union knowing that it might not be perfect, but at least not under the wrong assupmtion my code would work in all cases – Gunther Piez May 26 '11 at 10:00
0

The function should be named is_little_endian. I think you can use this union trick. Or also a cast to char.

excray
  • 2,738
  • 4
  • 33
  • 49
0

The code has undefined behavior, although some (most?) compilers will define it, at least in limited cases.

The intent of the standard is that reinterpret_cast be used for this. This intent isn't well expressed, however, since the standard can't really define the behavior; there is no desire to define it when the hardware won't support it (e.g. because of alignment issues). And it's also clear that you can't just reinterpret_cast between two arbitrary types and expect it to work.

From a quality of implementation point of view, I would expect both the union trick and reinterpret_cast to work, if the union or the reinterpret_cast is in the same functional block; the union should work as long as the compiler can see that the ultimate type is a union (although I've used compilers where this wasn't the case).

James Kanze
  • 150,581
  • 18
  • 184
  • 329