1

Ahh, nothing like the pre-C99 C standard... Where the following code just worked on little-endian machines:

union UNION_WORD {
    WORD word;
    BYTE bytes[2];
};

union UNION_WORD test;
test.word = 24576;
test.bytes[0] = 8;

And the data stored in test would be equivalent to 24584 when reading test.word. Sadly this behavior is undefined today in C and has always been undefined in C++.

As a preface, I'm an assembly kinda guy, but I want to give C++ another chance to redeem itself. I want to create a class like UNION_WORD that can be manipulated by individual bytes or the whole word, which is valid C++ and well-defined. The key here is that the C++ code:

WORD word = 24576;
BYTE byte = 8;
UNION_WORD test = 0;
test.word = word;
test.bytes[0] = byte;

compiles into something extremely simple like the following assembly (8-bit architecture):

ldi r1, 0x00
ldi r2, 0x60

ldi r3, 0x08

mov r4, r1
mov r5, r1

mov r4, r1
mov r5, r2

mov r4, r3

This code assumes everything can be done in registers without using ram. When finished, 0x6008 (24584) will be in [r4:r5]. My biggest concern is making sure the compiler doesn't cache bytes[] or word in the registers, assuming it hasn't been updated, because of a reinterpret_cast or the like. I custom operators may need to be defined. Most important thing is optimization for speed. How can this type of punning be achieved in c++?

timrau
  • 22,578
  • 4
  • 51
  • 64
Hackstaar
  • 233
  • 2
  • 13
  • In general it can't, unless you're willing to rely on implementation-dependent behavior. The language specs specifically make these kinds of things undefined, to avoid issues like big-endian vs little-endian. – Barmar May 02 '21 at 10:52
  • In this example the compiler would remove all of the code since the result is never used. It would improve the question to show a realistic example of the intended usage – M.M May 02 '21 at 10:54
  • @Barmar Yes, big/little endian is implementation defined, but the OP isn't asking about that. They are asking about the (unrelated) aliasing complications, in particular whether access through one alias is updating the others. – Peter - Reinstate Monica May 02 '21 at 11:15
  • 2
    You *can* access arbitrary objects through char pointers (the one exception to the strict aliasing rule), but here we have additionally the "punning through union" issue: Write through one member, read through another. That should be the actual issue here. The issue is thorny (https://stackoverflow.com/questions/25664848/unions-and-type-punning) but appears to boil down to: If you must, use a union (what you do), should work with the major implementations. – Peter - Reinstate Monica May 02 '21 at 11:31
  • Makes me wonder... does anyone have a code example that shows type punning through a `union` breaks in C++? Not GNU g++, since that compiler explicitly supports it, and I don't think that support can be disabled. And I don't have a DS9K machine at my disposal, which would easily show it breaks. – Eljay May 02 '21 at 11:48
  • Have you seen this [Timur Doumler video](https://youtu.be/5A9NZADhTwc)? It mention a proposal for c++23 to fill this gap. – MatG May 02 '21 at 15:07

0 Answers0