0

I am looking at some c++ code and I want to find out what union is doing to help translate a byte array into, well a different type such as a word. At least that is what I think is going on. Truly what I want is to figure out the purpose for this code, but I think I understand some of it.

My research has brought me bits and pieces of understanding, but I am not confident that I see the big picture correctly.

So lets say I have a union defined as:

typedef union _BYTE_TO_WORD {
    BYTE b[2];
    WORD w;
    short s;
} BYTE_TO_WORD;

Note that the byte here is 8 bits and the Word is an unsigned short and both shorts (signed and unsigned) are 16 bits.

then what happens if in the main code I have a struct....

byte[] data = someData;

struct TWO_WORDS {
    _BYTE_TO_WORD word1;
    _BYTE_TO_WORD word2;
}*theWordsIWant = (struct TWO_WORDS*)&data;

I think that the code above takes two bytes of data and put it into word1 and then the next two bytes of data are put into word2. With all the information about unions and structs out there, I can't seem to pin down a search that explains this code. If I am wrong here, please tell me.

So if I am right about that, then what in word1 or word2 has the value. So my research says that word1 would have a byte array in it, since it can only hold one value.

The translation must be another part of the code (that I haven't found yet) where we do this (assuming I could cast the byte to a WORD):

theWordsIWant.w = (WORD)theWordsIWant.b;

So then the bonus question is, why go to all this trouble with the union, when you could simply cast it as a different variable?

WORD w = (WORD)theWordsIWant.b;

Perhaps what is really going on is that the code will "cast a pointer to anything" as one answer here suggests (How to convert from byte array to word array in c).

I am pretty sure I am missing something, either in the motivation for doing this, or the way it works. But then again, maybe I actually understand it after all? I don't know. You tell me.

amalgamate
  • 2,200
  • 5
  • 22
  • 44
  • 1
    In a `union` all the members start at the same address. That is, the bytes that comprise each field of the union are all the same bytes. When one member changes, they all change (in the overlapping bytes). – Pointy Sep 11 '14 at 22:32
  • @Pointy I was just looking at that on MSDN... but it didn't sink in... Then the casting is happening automatically because they are the same size. Yes? Thanks. – amalgamate Sep 11 '14 at 22:34
  • right - the union is a way to "look at" the same piece of memory as if it were being used for different types of values. – Pointy Sep 11 '14 at 22:35
  • @Pointy I know I am cheating really by secretly asking two questions, but the struct can then be assigned a pointer that it maps too in a non overlapping way. With the "}*theWordsIWant = (struct TWO_WORDS*)&data;" bit? – amalgamate Sep 11 '14 at 22:38
  • @Pointy I think I get it after reading over Kaz's answer (weird, that was once a nickname of mine) anyway... it's just converting the pointer to the bytes to a struct TWO_WORDS * pointer. – amalgamate Sep 11 '14 at 22:49
  • At the risk of marking my own question a duplicate, this question/answer explains unions quite well: http://stackoverflow.com/questions/346536/difference-between-a-structure-and-a-union-in-c. – amalgamate Sep 25 '14 at 14:57
  • this duplicate too: http://stackoverflow.com/questions/4003087/whats-the-major-difference-between-union-and-struct-in-c?rq=1 – amalgamate Sep 25 '14 at 14:58

1 Answers1

2

This statement:

theWordsIWant.w = (WORD)theWordsIWant.b;

will not have the effect of loading two bytes from b and making them into a word. Since b is an array, the expression theWordsIWant.b produces a pointer to the first element, a BYTE * pointer. Its value is the address of the two characters, and so you're converting the address of the bytes to type WORD, not the contents of the bytes themselves.

What the union saves you from doing (at the cost of portability) is, rather, this type of code:

WORD w = ((WORD) b[1] << 8) | b[0];

the union does it using logic that is very similar to this type of code:

WORD w = *(WORD *) b;  // rather than: WORD w = (WORD) b;

That is: convert the pointer to the bytes to a WORD * pointer (pointer to WORD) and then dereference it to access both bytes simultaneously as a single WORD. What we are doing here is using pointer conversions to do type punning: we are creating an aliased view of b[0] and b[1] as if they were a single object of type WORD.

The union type in C and C++ does this declaratively. A union is like a struct, except that all the members are at offset 0: they overlap. The union has well-defined, portable behavior if we always access only that member which we last stored there. If we assign a value to w, and then access w, the behavior is uncontroversial. With unions, the possibility is that we can assign to members b[0] and b[1] and then retrieve w. The behavior is then "unspecified" (in C, as of the C99 standard).

In C++ uses of unions for type punning is not any more defined than using pointers for the same purpose; it is undefined behavior. Any aspect of whether such code works is thanks to the implementation.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • 1
    Many experts believe that the union also introduces undefined behavior, because you're reading from an inactive member. Unions are made for saving memory, not type punning. – Ben Voigt Sep 11 '14 at 22:46
  • @BenVoigt Those experts can believe whatever they want; the C standard makes the union access unspecified behavior. So for instance, compilers cannot break it on grounds of strict aliasing: a stale value cannot be read from a union member because of a caching optimization which depends on type. It should be more reliable than "bare" pointer-based punning. If we know everything about the representations that are involved, based on an implementation's documents, we can deduce the behavior of union-based type punning. – Kaz Sep 11 '14 at 23:07
  • The C standard has no applicability to this C++ question. C++ does not use the same rules for type punning in general, or unions in particular, as C. – Ben Voigt Sep 11 '14 at 23:09
  • @BenVoigt In that case, it may be necessary to put such code into a C translation unit and reach it via an `extern "C"` API. Or else rely on the C++ compiler vendor's assurances that supplant the undefined behavior. – Kaz Sep 11 '14 at 23:16
  • True, being undefined by the standard doesn't prohibit implementations from adding behavior guarantees. – Ben Voigt Sep 11 '14 at 23:27
  • @BenVoigt I looked this up in the n3242 draft (9.5 Unions); there indeed isn't anything like in the C standard. – Kaz Sep 11 '14 at 23:30