Conceptual problem in Union

Question

My code is this

// using_a_union.cpp
#include <stdio.h>

union NumericType
{
    int         iValue;
    long        lValue;  
    double      dValue;  
};

int main()
{
    union NumericType Values = { 10 };   // iValue = 10
    printf("%d\n", Values.iValue);
    Values.dValue = 3.1416;
    printf("%d\n", Values.iValue); // garbage value
}

Why do I get garbage value when I try to print Values.iValue after doing Values.dValue = 3.1416? I thought the memory layout would be like this. What happens to Values.iValue and Values.lValue; when I assign something to Values.dValue ?

is it an endian conversion issue? ~ Define "garbage". That would really help us answer your question. — jcolebrand, Nov 17 '10 at 04:37
@drachenstern: His question looks perfectly well phrased and formatted... much better than the usual "show me teh codez" we get from new-comers. — mpen, Nov 17 '10 at 04:43
@Mark ~ I agree. But since he's new, it might be nice to have a site reference. Also, we need to know what he defines as "garbage" and that would've been a nice inclusion. At least I didn't directly link him to JonSkeet's article ;) ... I've just been doing it for a lot of newcomers as a bit of housekeeping. Welcome to the club, be a good community member, that sort of thing. It wasn't meant as a detrimental post. — jcolebrand, Nov 17 '10 at 04:46
I don't understand. You posted a link to a picture of what you think is the memory layout. This picture is correct and it immediately answers your own question. Why are you asking it then? — AnT stands with Russia, Nov 17 '10 at 06:00

James McNellis · Accepted Answer · 2010-11-17T05:45:03.307

9

In a union, all of the data members overlap. You can only use one data member of a union at a time.

iValue, lValue, and dValue all occupy the same space.

As soon as you write to dValue, the iValue and lValue members are no longer usable: only dValue is usable.

Edit: To address the comments below: You cannot write to one data member of a union and then read from another data member. To do so results in undefined behavior. (There's one important exception: you can reinterpret any object in both C and C++ as an array of char. There are other minor exceptions, like being able to reinterpret a signed integer as an unsigned integer.) You can find more in both the C Standard (C99 6.5/6-7) and the C++ Standard (C++03 3.10, if I recall correctly).

Might this "work" in practice some of the time? Yes. But unless your compiler expressly states that such reinterpretation is guaranteed to be work correctly and specifies the behavior that it guarantees, you cannot rely on it.

edited Nov 17 '10 at 05:45

answered Nov 17 '10 at 04:37

James McNellis

348,265
75
913
977

1

That's not true (and I didn't downvote you). The other two values are indeed usable. – jcolebrand Nov 17 '10 at 04:40
@drachenstern: What part of my answer is not true? You can only use one data member of a union at a time. Have I misunderstood the question? – James McNellis Nov 17 '10 at 04:41
1

Again, not true. union Pixel { int rgb; char r; char g; char b; } All 'usable'. – Ed S. Nov 17 '10 at 04:43
2

I'll quote Wikipedia `One common C programming idiom uses unions to perform what C++ calls a reinterpret_cast, by assigning to one field of a union and reading from another, as is done in code which depends on the raw representation of the values.` ... the values are indeed usable. It's one of the main strengths of using a `union`. – jcolebrand Nov 17 '10 at 04:44
1

@Ed: Yes. All four data members are usable. However, _you can only use one at a time._ As soon as you write to `r`, the only data member you can read from is `r`. A union stores one value. – James McNellis Nov 17 '10 at 04:44
@James: He just told you what part isn't true... you can use all the values. Whether or not they contain anything logical is another story. The image he linked to indicates he understands that the data overlaps. – mpen Nov 17 '10 at 04:45
Again... not true. If I assign a value to the int part I can certainly read the component values (in an admittedly unportable way). I should have added an alpha channel byte, but you get it. – Ed S. Nov 17 '10 at 04:47
1

@Ed: Your example isn't right. All 3 chars overlap. You need to put them in a struct to make it work. I just compiled and ran an example though, and it appears to byte order is reversed on my machine. – mpen Nov 17 '10 at 04:53
7

@drachstern: Wikipedia is correct: using a union for object reinterpretation is a "common C programming idiom." Doing so is also, as far as both the C and C++ languages are concerned, wrong. Writing to one member of a union and then reading from a different member results in _undefined behavior._ You can read more in the C standard (C99 6.5/6-7 and C++03 3.10). – James McNellis Nov 17 '10 at 05:05
@Mark: See my above comment. – James McNellis Nov 17 '10 at 05:05
1

+1 @James. I also like this answer: http://stackoverflow.com/questions/1812348/a-question-about-union-in-c/1812359#1812359 – Fred Larson Nov 17 '10 at 05:07
1

@James: undefined behaviour != unusable. You need to make it clear that *according to the spec* it's undefined and perhaps shouldn't be relied upon across compilers, but that doesn't mean it'll (necessarily) give a compiler or run-time error. – mpen Nov 17 '10 at 05:07
This was most interesting. I've never had an answer get three downvotes before, even when I've posted answers that turned out to actually be wrong. – James McNellis Nov 17 '10 at 05:08
@Fred: Yeah, Andrey's answer there is top-notch. – James McNellis Nov 17 '10 at 05:08
Yes, you're right, I was typing a bit too quickly for my own good there. A char[4] would work however and that is the point. – Ed S. Nov 17 '10 at 05:12
1

@James: You should add that(Standard thing) to your answer (otherwise you can get more downvotes). BTW I haven't downvoted. – Prasoon Saurav Nov 17 '10 at 05:13
My main gripe is that this response does not answer the question at all. If the OP understood what was going on under the covers and why interpreting the first 4 bytes of a double as an int (or long) doesn't work he would be much better off. You don't provide that information here. – Ed S. Nov 17 '10 at 05:13
@Ed: The OP asks "What happens to `Values.iValue` and `Values.lValue`; when I assign something to `Values.dValue`?" I thought the question was quite clearly asked and I interpreted it literally. If the OP was asking this question out of confusion over the differences between floating point numbers and integers, I'd have expected that he would either have accepted one of the other answers to this question or followed up with another question. – James McNellis Nov 17 '10 at 05:19
@Prasoon: A good idea, though I'm more amused by the downvotes than anything. – James McNellis Nov 17 '10 at 05:21
There may be (have been?) weird architectures where some combinations of bits doesn't produce a legal integer, and other corner cases leading to the Standard classifying the behaviour as undefined, but the practical reality is that the behaviour is very well understood, and using unions as a kind of implicit reinterpret-cast mechanism is common to masses of system headers and OS-level files. As an amusing more than compelling example, my Linux box has ieee754.h defining bit fields for peeking/poking into floats and doubles, directly addressing the sign, mantissa, exponent etc.. – Tony Delroy Nov 17 '10 at 05:22
@Tony: That's true, though compilers are starting to utilize the strict aliasing rule to make aggressive optimizations. The question linked by Fred Larson in a comment above discusses this. – James McNellis Nov 17 '10 at 05:24
Fair enough, but I think that, due to how basic the question is, it is safe to assume that the OP doesn't really know how this stuff works. In that case it is good to give him a clear conceptual model of what is going on. – Ed S. Nov 17 '10 at 05:32
I apologize to the OP then for not providing a detailed exposition on the nature of `union` types and value representations. – James McNellis Nov 17 '10 at 05:58
3

@Ed Swangren: Absolutely incorrect. In both C and C++ it is explicitly illegal to read any components of a union except for the one that has been assigned last. Repeating that what you claim is legal many times won't make it true. Unions in C and C++ are tools for "memory time-sharing", not for memory reinterpretation, even though they are often incorrectly used for the latter purpose. – AnT stands with Russia Nov 17 '10 at 06:02
1

James is right. If you write to one member of a union and read from another, you evoke undefined behavior. So sayeth the Standard. – John Dibling Nov 17 '10 at 06:27
@James: do you refer above to Alex B's accepted answer? - that illustrates that reads from uninitialised memory can return different results at different optimisation levels, but says nothing about union usage where all the bits have been set. – Tony Delroy Nov 19 '10 at 04:00

Ed S. · Answer 2 · 2010-11-17T05:21:18.480

Because floating point numbers are represented differently than integers are.

All of those variables occupy the same area of memory (with the double occupying more obviously). If you try to read the first four bytes of that double as an int you are not going to get back what you think. You are dealing with raw memory layout here and you need to know how these types are represented.

EDIT: I should have also added (as James has already pointed out) that writing to one variable in a union and then reading from another does invoke undefined behavior and should be avoided (unless you are re-interpreting the data as an array of char).

score 2 · Answer 3 · answered Nov 17 '10 at 05:02

Well, let's just look at simpler example first. Ed's answer describes the floating part, but how about we examine how ints and chars are stored first!

Here's an example I just coded up:

#include "stdafx.h"
#include <iostream>
using namespace std;

union Color {
    int value;
    struct {
        unsigned char R, G, B, A;
    };
};

int _tmain(int argc, _TCHAR* argv[])
{
    Color c;
    c.value = 0xFFCC0000;
    cout << (int)c.R << ", " << (int)c.G << ", " << (int)c.B << ", " << (int)c.A << endl;
    getchar();
    return 0;
}

What would you expect the output to be?

255, 204, 0, 0

Right?

If an int is 32 bits, and each of the chars is 8 bits, then R should correspond to the to the left-most byte, G the second one, and so forth.

But that's wrong. At least on my machine/compiler, it appears ints are stored in reverse byte order. I get,

0, 0, 204, 255

So to make this give the output we'd expect (or the output I would have expected anyway), we have to change the struct to A,B,G,R. This has to do with endianness.

Anyway, I'm not an expert on this stuff, just something I stumbled upon when trying to decode some binaries. The point is, floats aren't necessarily encoded the way you'd expect either... you have to understand how they're stored internally to understand why you're getting that output.

Tony Delroy · Answer 4 · 2010-11-17T05:11:58.350

You've done this:

union NumericType Values = { 10 };   // iValue = 10 
printf("%d\n", Values.iValue); 
Values.dValue = 3.1416;

How a compiler uses memory for this union is similar to using the variable with largest size and alignment (any of them if there are several), and reinterpret cast when one of the other types in the union is written/accessed, as in:

double dValue; // creates a variable with alignment & space
               // as per "union Numerictype Values"
*reinterpret_cast<int*>(&dValue) = 10; // separate step equiv. to = { 10 }
printf("%d\n", *reinterpret_cast<int*>(dValue)); // print as int
dValue = 3.1416;                                 // assign as double
printf("%d\n", *reinterpret_cast<int*>(dValue));  // now print as int

The problem is that in setting dValue to 3.1416 you've completely overwritten the bits that used to hold the number 10. The new value may appear to be garbage, but it's simply the result of interpreting the first (sizeof int) bytes of the double 3.1416, trusting there to be a useful int value there.

If you want the two things to be independent - so setting the double doesn't affect the earlier-stored int - then you should use a struct/class.

It may help you to consider this program:

#include <iostream>

void print_bits(std::ostream& os, const void* pv, size_t n)
{
    for (int i = 0; i < n; ++i)
    {
        uint8_t byte = static_cast<const uint8_t*>(pv)[i];
        for (int j = 0; j < 8; ++j)
            os << ((byte & (128 >> j)) ? '1' : '0');
        os << ' ';
    }
}

union X
{
    int i;
    double d;
};

int main()
{
    X x = { 10 };
    print_bits(std::cout, &x, sizeof x);
    std::cout << '\n';
    x.d = 3.1416;
    print_bits(std::cout, &x, sizeof x);
    std::cout << '\n';
}

Which, for me, produced this output:

00001010 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
10100111 11101000 01001000 00101110 11111111 00100001 00001001 01000000

Crucially, the first half of each line shows the 32 bits that are used for iValue: note the 1010 binary in the least significant byte (on the left on an Intel CPU like mine) is 10 decimal. Writing 3.1416 changes the entire 64-bits to a pattern representing 3.1416 (see http://en.wikipedia.org/wiki/Double_precision_floating-point_format). The old 1010 pattern is overwritten, clobbered, an electromagnetic memory no more.

Conceptual problem in Union

4 Answers4