Using unions to simplify casts

Question

I realize that what I am trying to do isn't safe. But I am just doing some testing and image processing so my focus here is on speed.

Right now this code gives me the corresponding bytes for a 32-bit pixel value type.

struct Pixel {
    unsigned char b,g,r,a;
};

I wanted to check if I have a pixel that is under a certain value (e.g. r, g, b <= 0x10). I figured I wanted to just conditional-test the bit-and of the bits of the pixel with 0x00E0E0E0 (I could have wrong endianness here) to get the dark pixels.

Rather than using this ugly mess (*((uint32_t*)&pixel)) to get the 32-bit unsigned int value, i figured there should be a way for me to set it up so I can just use pixel.i, while keeping the ability to reference the green byte using pixel.g.

Can I do this? This won't work:

struct Pixel {
    unsigned char b,g,r,a;
};
union Pixel_u {
    Pixel p;
    uint32_t bits;
};

I would need to edit my existing code to say pixel.p.g to get the green color byte. Same happens if I do this:

union Pixel {
    unsigned char c[4];
    uint32_t bits;
};

This would work too but I still need to change everything to index into c, which is a bit ugly but I can make it work with a macro if i really needed to.

You rarely hear the words "union" and "simplify" in the same sentence... — BlueRaja - Danny Pflughoeft, May 20 '10 at 18:49
I believe that applying the 0xE0 mask is not equivalent to <= 0x10. In particular 0x1F is greater than 0x10 and applying the mask will yield 0x00 — David Rodríguez - dribeas, May 20 '10 at 18:51
And your rejected 'this ugly mess' \*((uint32_t\*)&pixel) would also break the strict aliasing rules. — Joseph Quinsey, May 20 '10 at 19:53
Your suggested test (`i & 0x00E0E0E0UL == 0`) will actually test each pixel for `< 0x20`, not `<= 0x10` as you want. — caf, May 20 '10 at 22:23

score 20 · Accepted Answer · edited May 23 '17 at 11:53

(Edited) Both gcc and MSVC allow 'anonymous' structs/unions, which might solve your problem. For example:

union Pixel {
   struct {unsigned char b,g,r,a;};
   uint32_t bits;  // use 'unsigned' for MSVC
}

foo.b = 1;
foo.g = 2;
foo.r = 3;
foo.a = 4;
printf ("%08x\n", foo.bits);

gives (on Intel):

04030201

This requires changing all your declarations of struct Pixel to union Pixel in your original code. But this defect can be fixed via:

struct Pixel {
    union {
        struct {unsigned char b,g,r,a;};
        uint32_t bits; 
    };
} foo;

foo.b = 1;
foo.g = 2;
foo.r = 3;
foo.a = 4;
printf ("%08x\n", foo.bits);

This also works with VC9, with 'warning C4201: nonstandard extension used : nameless struct/union'. Microsoft uses this trick, for example, in:

typedef union {
    struct {
        DWORD LowPart;
        LONG HighPart;
    };  // <-- nameless member!
    struct {
        DWORD LowPart;
        LONG HighPart;
    } u;
    LONGLONG QuadPart;
} LARGE_INTEGER;

but they 'cheat' by suppressing the unwanted warning.

While the above examples are ok, if you use this technique too often, you'll quickly end up with unmaintainable code. Five suggestions to make things clearer:

(1) Change the name bits to something uglier like union_bits, to clearly indicate something out-of-the-ordinary.

(2) Go back to the ugly cast the OP rejected, but hide its ugliness in a macro or in an inline function, as in:

#define BITS(x) (*(uint32_t*)&(x))

But this would break the strict aliasing rules. (See, for example, AndreyT's answer: C99 strict aliasing rules in C++ (GCC).)

(3) Keep the original definiton of Pixel, but do a better cast:

struct Pixel {unsigned char b,g,r,a;} foo;
// ...
printf("%08x\n", ((union {struct Pixel dummy; uint32_t bits;})foo).bits);

(4) But that is even uglier. You can fix this by a typedef:

struct Pixel {unsigned char b,g,r,a;} foo;
typedef union {struct Pixel dummy; uint32_t bits;} CastPixelToBits;
// ...
printf("%08x\n", ((CastPixelToBits)foo).bits);    // not VC9

With VC9, or with gcc using -pedantic, you'll need (don't use this with gcc--see note at end):

printf("%08x\n", ((CastPixelToBits*)&foo)->bits); // VC9 (not gcc)

(5) A macro may perhaps be preferred. In gcc, you can define a union cast to any given type very neatly:

#define CAST(type, x) (((union {typeof(x) src; type dst;})(x)).dst)   // gcc
// ...
printf("%08x\n", CAST(uint32_t, foo));

With VC9 and other compilers, there is no typeof, and pointers may be needed (don't use this with gcc--see note at end):

#define CAST(typeof_x, type, x) (((union {typeof_x src; type dst;}*)&(x))->dst)

Self-documenting, and safer. And not too ugly. All these suggestions are likely to compile to identical code, so efficiency is not an issue. See also my related answer: How to format a function pointer?.

Warning about gcc: The GCC Manual version 4.3.4 (but not version 4.3.0) states that this last example, with &(x), is undefined behaviour. See http://davmac.wordpress.com/2010/01/08/gcc-strict-aliasing-c99/ and http://gcc.gnu.org/ml/gcc/2010-01/msg00013.html.

In that case it is the `struct` that is anonymous, not the union. — nategoose, May 20 '10 at 19:08
Exactly what I thought of when I read the question, but you beat me to it! — A. Levy, May 20 '10 at 20:36
Be aware that using anonymous unions and structs restricts the portability of your code. There are many C89 compilers that don't support them. — tomlogic, May 21 '10 at 17:34

fbrereto · Answer 2 · 2010-05-20T20:27:33.880

10

Why not make the ugly mess into an inline routine? Something like:

inline uint32_t pixel32(const Pixel& p)
{
    return *reinterpret_cast<uint32_t*>(&p);
}

You could also provide this routine as a member function for Pixel, called i(), which would allow you to access the value via pixel.i() if you preferred to do it that way. (I'd lean on separating the functionality from the data structure when invariants need not be enforced.)

edited May 20 '10 at 20:27

answered May 20 '10 at 18:50

fbrereto

35,429
19
126
178

-1 for even suggesting a macro when any number of other solutions would accomplish the same goal without subverting the compiler. – John Dibling May 20 '10 at 20:23
1

`reinterpret_cast` is a C++-ism. ;) – mipadi May 20 '10 at 21:37
@mipadi: At the time I answered the question it had the `[c++]` tag, but yes you are right. – fbrereto May 20 '10 at 21:38
Yeah, I noticed after my comment that the post had recently been edited to remove the C++ tag. – mipadi May 20 '10 at 22:04

score 10 · Answer 3 · answered May 20 '10 at 19:17

The problem with a structure inside a union, is that the compiler is allowed to add padding bytes between members of a structure (or class), except bit fields.

Given:

struct Pixel
{
  unsigned char red;
  unsigned char green;
  unsigned char blue;
  unsigned char alpha;
};

This could be laid out as:

Offset  Field
------  -----
0x00    red
0x04    green
0x08    blue
0x0C    alpha

So the size of the structure would be 16 bytes.

When put in a union, the compiler would take the larger capacity of the two to determine space. Also, as you can see, a 32 bit integer would not align correctly.

I suggest creating functions to combine and extract pixels from a 32-bit quantity. You can declare it inline too:

void Int_To_Pixel(const unsigned int word,
                  Pixel& p)
{
  p.red =   (word & 0xff000000) >> 24;
  p.blue =  (word & 0x00ff0000) >> 16;
  p.green = (word & 0x0000ff00) >> 8;
  p.alpha = (word & 0x000000ff);
  return;
}

This is a lot more reliable than a struct inside a union, including one with bit fields:

struct Pixel_Bit_Fields
{
  unsigned int red::8;
  unsigned int green::8;
  unsigned int blue::8;
  unsigned int alpha::8;
};

There is still some mystery when reading this whether red is the MSB or alpha is the MSB. By using bit manipulation, there is no question when reading the code.

Just my suggestions, YMMV.

Using unions to simplify casts

3 Answers3