3

What are the disadvantages of using unions when storing some information like a series of bytes and being able to access them at once or one by one.

Example : A Color can be represented in RGBA. So a color type may be defined as,

typedef unsigned int RGBAColor;

Then we can use "shifting and masking" of bits to "retrieve or set" the red, green, blue, alpha values of a RGBAColor object ( just like it is done in Direct3D functions with the macro functions such as D3DCOLOR_ARGB() ).

But what if I used a union,

union RGBAColor
{
unsigned int Color;
struct RGBAColorComponents
{
    unsigned char Red;
    unsigned char Green;
    unsigned char Blue;
    unsigned char Alpha;
} Component;
};

Then I will not be needing to always do the shifting (<<) or masking (&) for reading or writing the color components. But is there problem with this? ( I suspect that this has some problem because I haven't seen anyone using such a method. )

Can Endianness Be a broblem? If we always use Component for accessing color components and use Color for accessing the whole thing ( for copying, assigning, etc.. as a whole ) the endianness should not be a problem, right?

-- EDIT -- I found an old post which is the same problem. So i guess this question is kinda repost :P sorry for that. here is the link : Is it a good practice to use unions in C++?

According to the answers it seems that the use of unions for the given example is OK in C++. Because there is no change of data type in there, its just two ways to access the same data. Please correct me if i am wrong. Thanks. :)

Community
  • 1
  • 1
Deamonpog
  • 805
  • 1
  • 10
  • 24

2 Answers2

2

This usage of unions is illegal in C++, where a union comprises overlapping, but mutually exclusive objects. You are not allowed to write one member of a union, then read out another member.

It is legal in C where this is a recommended way of type punning.

This relates to the issue of (strict) aliasing, which is a difficulty faced by the compiler when trying to determine whether two objects with different types are distinct. The language standards disagree because the experts are still figuring out what guarantees can safely be provided without sacrificing performance. Personally, I avoid all of this. What would the int actually be used for? The safe way to translate is to copy the bytes, as by memcpy.

There is also the endianness issue, but whether that matters depends on what you want to do with the int.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • Er, no. Using unions to reinterpret data is the exact *opposite* of deprecated, it was codified into C11 as a standardization of existing practice. `union { struct foo x; unsigned y[4]; };` is basically the **new preferred** way to shuffle bits between types when you're too good for `memcpy()`. Strict aliasing rules are what you break when you cast pointers around, e.g., `(unsigned *) &x`. And yes, I mean "new" -- only in the past decade have "advances" in compilers caused problems for those who break the actual strict aliasing rules. – Dietrich Epp Feb 07 '13 at 07:28
  • So if i am understanding correctly, according to the Strict Aliasing rule the problem is if i use pointers inside the union like `union { unsigned int * pi; unsigned char * pc; }`. And its good to use `union { unsigned int val; unsigned char vals[4]; } `. ( I didn't know about this Strict Aliasing rule. So pardon me. I am reading [this article](http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html) now.) – Deamonpog Feb 07 '13 at 07:40
  • @DietrichEpp Are you sure? It might be different between C11 and C++11. – Potatoswatter Feb 07 '13 at 08:10
  • @Deamonpog: You're breaking completely different rules when you do that. Don't do that. When you need to convert a pointer from one type to another, use a cast instead. – Dietrich Epp Feb 07 '13 at 08:14
  • I am sure that accessing the wrong member of a `union` is not against strict aliasing rules. In C, it *was* against a different rule which forbade accessing members of unions other than the "correct" one, but this rule has been changed so the technique is valid but the results are implementation-defined. – Dietrich Epp Feb 07 '13 at 08:16
  • 1
    As far as C++ is concerned, the `union` technique is the de facto method of doing this, precisely because (1) it doesn't break strict aliasing (2) other techniques do break strict aliasing (3) if you break strict aliasing you will probably run into problems. That is, assuming that you can't use `memcpy()` and you're not casting to `unsigned char`, which is okay anyway. – Dietrich Epp Feb 07 '13 at 08:17
  • Thanks. I am actually using C++. So usign this method will be fine for the colors. :) – Deamonpog Feb 07 '13 at 08:50
  • 1
    @DietrichEpp [This](http://stackoverflow.com/a/12588202/153285) and [this answer](http://stackoverflow.com/questions/7291874/is-strict-aliasing-is-c-or-c-thing) show that aliasing by `union` is OK in C, not in C++. C++ allows `memcpy`, which effectively initializes an object before its lifetime begins. But it does not permit two objects to simultaneously live in the same place. – Potatoswatter Feb 07 '13 at 14:22
  • @Potatoswatter: Yes, that's what "de facto" means. – Dietrich Epp Feb 07 '13 at 22:40
  • 1
    @DietrichEpp If the de facto practice is UB, then it still needs to change. The rules are whatever they are defined to be; popular opinion doesn't matter. – Potatoswatter Feb 08 '13 at 00:40
  • @Potatoswatter: I think this is veering farther off topic. Your answer claims that this is against the strict aliasing rules when in fact it is **not** against strict aliasing rules. Your answer claims that this behavior is deprecated without referencing which standard — but this incorrect no matter which standard you choose. In C++ the technique cannot be called "deprecated" because it was never legal to begin with. In C, "deprecated" is the exact opposite of the correct word to use, since the behavior was UB but is no longer UB. – Dietrich Epp Feb 08 '13 at 01:00
  • The discussion about "de facto" was to expand upon the topic. Historically speaking, both standards committees have seen it as part of their mandate to codify existing practices: they have the viewpoint that existing code is more valuable than existing implementations. So the reason I brought it up is because I consider it far more likely that a future C++ standard will redefine the `union` trick have defined behavior, and somewhat less likely that compiler vendors will introduce optimizations that cause undesired behavior. This is not mathematics, the standard is not all-important. – Dietrich Epp Feb 08 '13 at 01:05
  • @DietrichEpp Thanks for clarifying your meaning. It's a contentious issue… – Potatoswatter Feb 08 '13 at 01:53
0

I believe using the union solves any problems related to endianness, as most likely the RGBA order is defined in network order. Also the fact that each component will be uint8_t or such, can help some compilers to use sign/zero extended loads, storing the low 8 bits directly to a nonaligned byte pointer and being even able to parallelize some byte operations (e.g. arm has some packed 4x8 bit instructions).

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57