Union hack for endian testing and byte swapping

Question

For a union, writing to one member and reading from other member (except for char array) is UB.

//snippet 1(testing for endianess): 

union
{
    int  i;
    char c[sizeof(int)];
} x;

x.i = 1;                     // writing to i
if(x.c[0] == 1)              // reading from c[0]
{   printf("little-endian\n");
}
else
{   printf("big-endian\n");
}

//snippet 2(swap bytes using union):

int swapbytes()
{
    union                   // assuming 32bit, sizeof(int)==4
    {        
        int  i;
        char c[sizeof(int)];
    } x;
    x.i = 0x12345678;       // writing to member i
    SWAP(x.ch[0],x.ch[3]);  // writing to char array elements
    SWAP(x.ch[1],x.ch[2]);  // writing to char array elements
    return x.i;             // reading from x.i 
}

Snippet 1 is legal C or C++ but not snippet 2. Am I correct? Can some one point to the section of standard where it says its OK to write to a member of union and read from another member which is a char array.

There is already a family of functions for handling endianess. Do a google for htonl() — Martin York, Jun 15 '11 at 14:54
Rules on union punning are different between all three of C89, C99 and C++, check the sections on unions in each. In C++, for example, it's banned except for POD members with a common initial sequence (9.5/1). Accessing any object through a char pointer is a special case, though, union member or not. It's permitted under strict aliasing rules. I'm not sure whether this means it's OK to *obtain* that pointer by decay of a char array member of union, but I don't immediately see why not. — Steve Jessop, Jun 15 '11 at 15:27
It would be more interesting if this was written as a template. The choice to do the swap(s) could then be done at compile time and automated. — Martin York, Jun 15 '11 at 16:41
@Steve. Actually there are 4 different answers now: C89, C99, C++03 and C++11. (In '11 you can have non-POD union members, but you get to use placement new) — Lambdageek, Jun 15 '11 at 18:35

Goz · Answer 1 · 2011-06-16T08:09:38.797

3

There is a really simple way that gets round the undefined behaviour (well undefinied behvaiour that is defined in pretty much every compiler out there ;)).

uint32_t i = 0x12345678;
char ch[4];
memcpy( ch, &i, 4 );

bool bLittleEndian = ch[0] == 0x78;

This has the added bonus that pretty much every compiler out there will see that you are memcpying a constant number of bytes and optimise out the memcpy completely resulting in exactly the same code as your snippet 1 while staying totally within the rules!

edited Jun 16 '11 at 08:09

answered Jun 15 '11 at 16:02

Goz

61,365
24
124
204

1

Given the squabbling about perverse sizes of `int` on the other answer, I'd probably switch this over to using the uintX_t types. – jkerian Jun 15 '11 at 16:10
Of course there are other ways to test endianess (your method is one of them). What I am interested is whether snippet 1 is legal in C or C++? What I conclude from the little reading I have done from StackOverflow is that snippet 1 may be implementation defined in C, while it is UB in C++ (I may be wrong.). – Burt Jun 16 '11 at 03:41

score 2 · Accepted Answer · answered Jun 15 '11 at 15:01

2

I believe it (snippet 1) is technically not allowed, but most compilers allow it anyway because people use this kind of code. GCC even documents that it is supported.

You will have problems on some machines where sizeof(int) == 1, and possibly on some that are neither big endian nor little endian.

Either use available functions that change words to the proper order, or set this with a configuration macro. You probably need to recognize compiler and OS anyway.

answered Jun 15 '11 at 15:01

Bo Persson

90,663
31
146
203

1

Though there are few enough systems out there today where `sizeof(int) == 1`, so it's one of those "technically but who cares" situations. – Jonathan Grynspan Jun 15 '11 at 15:06
`sizeof(int) == 1` is a nuisance, since it means that `EOF` is equal to some value of `unsigned char`. I think it happens in some DSPs and Crays, though. If the machine isn't octet-oriented, then you may well have to do something special anyway in a situation where you're worrying about endianness. – Steve Jessop Jun 15 '11 at 15:18
http://stackoverflow.com/questions/1812348/a-question-about-union-in-c/1812359#1812359 What I gather from AndreyT's answer is that snippet 1 is legal C. AlexB's answer in the same link states that its not technical. Also this method is used as an example for testing endianess @ c-faq.com. I learnt that this union hack for endianess was asked at an interview(source: StackOverflow). I believe the best answer is implementation defined even if one is reading from a char array. – Burt Jun 15 '11 at 18:21
@Burt - I don't understand the C standard well enough to see if it is allowed or not. :-) My answer was aimed at C++ which explicitly states that there is (at most) one member in the union at any time. Reading from a member that isn't there looks like UB! – Bo Persson Jun 15 '11 at 18:41
1

@Bo Persson: C99 says that the bytes that don't correspond to the last stored member take unspecified values (potentially implying that it's legal to read from members which share all its bytes with the last stored member), but then lists reading from a member other than the last stored as UB. C1X changes that, defining as UB reading from bytes that do not correspond to the last stored member. – ninjalj Jun 15 '11 at 18:48
@Jonathan Grynspan this typically happens on 8-bit systems and although not very common the gameboy classic & color homebrew kits tend to have sizeof(int) equal 1. – Jasper Bekkers Jun 16 '11 at 08:26
I am aware such systems exist. They're just not common these days. – Jonathan Grynspan Jun 16 '11 at 11:58

Union hack for endian testing and byte swapping

2 Answers2

Linked