7

I'm able to print the address and values of ints but not the chars of the union.Why is that so

#include <iostream>

using namespace std;

union Endian
{
    int i;
    char c[sizeof(int)];
    int j;
};

int main(int argc, char *argv[]) {
    Endian e;
    e.i = 20;
    cout << &e.j;
    cout << &e.i;
    cout << &e.c[0]; //Why can't I print this address
    cout << e.c[1]; // Why can't I print this value

}

O/P:0x7fff5451ab68 0x7fff5451ab68

tez
  • 4,990
  • 12
  • 47
  • 67
  • Pipe your program's output through `cat -A` or `xxd` to see what's going on, and accept NPE's answer. – jthill Apr 11 '13 at 21:31

3 Answers3

22

Disclaimer: OP's tags are quite ambiguous, so this answer uses the code as a frame of reference, which is C++ (use of iostream, pulling in the std namespace, cout).

You're using union in an inappropriate way. But we'll get back to that later.

e.i = 20;

Your code first uses the union as i, an integer. Which is okay. But what you did afterwards is really not a good idea. First you did two somewhat acceptable things:

cout << &e.j;
cout << &e.i;

You queried the address of the two ints in the union, which is marginally fine because they all share storage and the address of the first byte is therefore shared.

cout << &e.c[0]; //Why can't I print this address
cout << e.c[1]; // Why can't I print this value

Now, here's where you're crossing the line. You're now performing implicit pointer arithmetic and dereferencing in terms of indexing into the char[] array and even though you're trying to get the address of the first element, there's possible evaluation of an element which is not the last one set in the union. So, that's a big no-no.

Furthermore, &e.c[0] basically is char* which will be "intercepted" by cout and treated as a C-style string. It will not treat it as a simple address.

cout << e.c[1]; // Why can't I print this value

Undefined behavior. "But, but!", I can hear some of you say. Yes, it is UB in C++. Valid in C99 (6.5/7), and just barely by means of a footnote and some duct tape. It's a simple matter, already explained by LightnessRacesInSpace and Mysticial in the comments of this answer and others.

Yes, you can cast any typed variable you have to a char array and mess with it for whatever purpose you have in mind. But type-punning through unions is illegal in C++, there are no buts and excuses. Yes, it may work. Yes, if you're not bothered by it, you may continue to use it. But per C++ standard, it is clearly illegal.

Unless that member was the last member of the union to which you assigned a value, you shall not retrieve its value. It's as simple as that.

Unions in C++ have a purpose, described below. They can also have member functions and access specifiers. They cannot have virtual functions or static members. Neither can they be used as a base class or inherit from something. And they are not to be used for type-punning. It's illegal in C++.

Read further!

Understanding unions

A union is:

  • A way to allow memory reuse.
  • That's it.

A union is not:

  • A way to cowboy-cast between elements of the union
  • A way to cheat strict aliasing.

Even MSDN's got it right:

A union is a user-defined data or class type that, at any given time, contains only one object from its list of members (although that object can be an array or a class type).

What does this mean? It means that you can define something along the lines of this:

union stuff {

    int i;
    double d;
    float f;    

} m;

The idea is that all of them sit in the same space in memory. Storage of a union is inferred from the largest datatype in a given implementation. Platforms have a lot of freedom here. Freedom the specifications cannot cover. Not C. Not C++.

You must not write to the union as an int and then read it as a float (or anything else) as a way of some weird cowboy reinterpret_cast.

The use of std::cout is for example purposes and simplicity.

This is illegal:

m.i = 5;
std::cout << m.f; // NO. NO. NO. Please, no.

This is legal:

m.i = 5;
std::cout << m.i;

// Now I'm done with i, I have no intention of using it
// If I do, I'll make sure I properly set it.

m.f = 3.0f;
std::cout << m.f; // No "cowboy-interpreting", defined.

// I've got an idea, but I need it to be an int.

m.i = 3; // m.f and m.d are here-by invalidated.
int lol = 5;
m.i += lol;

Notice how there's no "cross-fire". This is the intended usage. Slim memory storage for three variables used at three different times with no fighting.

How did the misconception rise? Some very bad people woke up one day and I bet one of them was a 3D programmer and thought about doing this:

// This is wrong on so many different levels.
union {

    float arr[4];
    struct {
        float x,y,z,w;
    };

};

He undoubtedly had a "noble idea", to access a 4-tuple both as a float array and as individual xyzw members. Now, you know why this is wrong in terms of unions, but there is one more failure in here:

C++ does not have anonymous structs. It does have anonymous unions, for purposes illustrated above to bring it closer to the intended usage (dropping the m. "prefix"), as you can surely see how that benefits the general idea behind unions.

Don't do this. Please.

  • 5
    And just to clarify. [Union type-punning has been legal (but implementation-defined behavior) since C99 TR2.](http://stackoverflow.com/questions/11639947/is-type-punning-through-a-union-unspecified-in-c99-and-has-it-become-specified) But it remains UB in C++ as of C++11. – Mysticial Apr 11 '13 at 21:06
  • 4
    Although type-punning through a union is technically UB in C89 and C++, it is an extremely common idiom and is well-supported by all major compilers. – Adam Rosenfield Apr 11 '13 at 21:10
  • 1
    However, accessing plain object as characters has *always* been intentionally legal and supported behavior. This answer is almost entirely irrelevant to the poster's situation and question. – jthill Apr 11 '13 at 21:11
  • 3
    @jthill: That was only ever legal if you cast a pointer. As far as I am aware access through a union holds no such exemption. – Puppy Apr 11 '13 at 21:12
  • In C++ that's not true, it's access via any lvalue of type char that's permitted. – jthill Apr 11 '13 at 21:14
  • @jthill `e.i = 20; cout << e.c[1];` Illegal. End of story. –  Apr 11 '13 at 21:19
  • 1
    And the C standard uses effectively identical language. @DomagojPandža Please point out where the standard forbids character access to an object's data. – jthill Apr 11 '13 at 21:26
  • 1
    You're stating something from the C standard, namely 6.5/7. Yes, it is legal in the C language to reinterpret the content of any object as a char array through a union. But C++ does not. OP wrote C++ code, not C. Now, it may work. But it's technically UB. You and the OP are more than welcome to indulge yourself in unsanctioned bad pratice and disregard everything I wrote. I'm done here. I was only trying to help. –  Apr 11 '13 at 21:42
  • 1
    If anything, C++11 standard allows type-punning only on PODs with same layouts. It doesn't allow such access through unions. Yes, you can interpret everything as a char array, but in C++ you do it with a proper cast. This is not the usage of unions. You may not like it, but that's the way it is. It will *probably* work. If that's good enough for you, knock yourself out. –  Apr 11 '13 at 21:52
  • 1
    C++ 3.5p10: "a `char` or `unsigned char` type" (3.10p10) C 6.5p7 "a character type" (6.5p7). That's the entire relevant difference between the accesses sanctioned by the two languages. And, btw, C++ 9p10 says the OP's union is POD. – jthill Apr 11 '13 at 21:54
  • POD **with the same layout**. Modifying the union as a char array damages everything else in OP's case, it is not the same layout. They **are not** mapped 1:1. I don't know what you are trying to do here. That is not the usage of unions. It simply is not. –  Apr 11 '13 at 22:00
  • Please point out where either standard draws the distinction you claim between `((char*)&e)[1]` and `e.c[1]` – jthill Apr 11 '13 at 22:05
  • Is the OP type-punning through a union **and** the code is C++? **Yes. Refer to the first comment on this answer.** –  Apr 11 '13 at 22:13
  • [Also, it seems others have already explained this to you.](http://stackoverflow.com/questions/15952204/using-char-array-inside-union/15953675#comment22733902_15952308) There's a huge difference between properly casting something to a char array for whatever evil purposes you have in mind and illegal type-punning through a union. –  Apr 11 '13 at 23:11
  • 1
    Please cite language in the standard to support your assertions. That both standards also support the struct hack does not imply they don't support `char` references. In particular, I think you're going to have a hard time reconciling your claims with C's 6.5p6 and a much harder time with C++'s 3.9p2, "For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char." – jthill Apr 12 '13 at 01:43
  • @jthill No assertions here. "Right, and unless that member was the last member of the union to which you assigned a value, you shall not retrieve its value. It's as simple as that." That's the only point we're making. Nothing further. End of story. Goodbye. –  Apr 12 '13 at 01:53
  • 4
    This is now a thing I have: https://dl.dropboxusercontent.com/u/17632594/cowboy_cast.png –  Apr 12 '13 at 02:27
4

Strictly speaking, the behaviour of your code is undefined. Contrary to what I said earlier, the behaviour of the code is not undefined (I think it's implementation-defined). See https://stackoverflow.com/a/1812932/367273 for an explanation.

What happens is that &e.c[0] is of type char*, and therefore gets printed as a C string, not as a pointer. The string is either blank or consists of non-printable characters, so you see no output. A similar thing happens to e.c[1], except that it's a single char and not a string.

When I initialize e as follows:

e.i = 0x00424344;

the last two lines print DBC and B respectively (this exploits the fact that on my machine, int 32 bits wide and is little-endian).

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 1
    The compiler is not zeroing out e.c -- he's setting e.i to 20, so the MSB of the two byte value is 0 and the LSB is 20. – K Scott Piel Apr 11 '13 at 15:05
  • @KScottPiel: Thanks. I misread the `union` as a `struct` - fixing the answer now. – NPE Apr 11 '13 at 15:06
  • No, it isn't undefined. You can access any C-like type through a char* and the compiler's required to behave as if it were really an array of `char`. – jthill Apr 11 '13 at 15:14
  • This question is tagged both C and C++. In C, some access to members other than the last one stored is defined. I do not see language in the C++ standard that defines such behavior. – Eric Postpischil Apr 11 '13 at 15:18
  • I am starting to regret attempting to answer this (which I only did having initially misread the question as being about a `struct` and not a `union`)! :-) – NPE Apr 11 '13 at 15:19
  • The corresponding list of valid accesses in the C++ standard is in \[basic.lval\], 3.10p10. – jthill Apr 11 '13 at 15:27
2

It's Undefined Behaviour to access field of the union with a type other than the last set one, at least in C++.

Whilst taking an address is legal in theory, that's not what the unions are for.

Bartek Banachewicz
  • 38,596
  • 7
  • 91
  • 135
  • @KScottPiel No. The point of a union is to be able to share storage _location_ for objects of _different types_, of which you know that only one will ever be used at any given time. – sehe Apr 11 '13 at 15:10
  • Accessing through a `char*` is a special case, you can do that and the compiler's required to respect it. – jthill Apr 11 '13 at 15:11
  • 1
    @KScottPiel and still it's UB to access it the way I stated. What's so hard to understand there? And I can't see how invoking UB is *splitting hairs*. – Bartek Banachewicz Apr 11 '13 at 15:12
  • 2
    @jthill can you provide some reference to that statement? – Bartek Banachewicz Apr 11 '13 at 15:13
  • 2
    @KScottPiel: What the hell is pointless about debating semantics of the language? That's the _only thing to do here_. – Lightness Races in Orbit Apr 11 '13 at 15:13
  • Note that C does provide some definition for storing to one member of a union and accessing another. There are some constraints and provisions for including implementation-defined aspects of the behavior, but it is defined in many situations. I do not see the same language in C++. So it might be useful to clarify the distinction. Is there language in the C++ standard that makes it clear that accessing a different member is not defined? – Eric Postpischil Apr 11 '13 at 15:16
  • 2
    I'll retract my statement... but I'm stunned by the fact that the code I've written over the past 30+ years worked anyway. Whatever. – K Scott Piel Apr 11 '13 at 15:17
  • @BartekBanachewicz [Google it](https://www.google.com/search?q=c+access+through+char+pointer+undefined). – jthill Apr 11 '13 at 15:18
  • @KScottPiel: UB is not required to cause a failure to compile or "work". You got [un]lucky. (Which, to be completely fair, is the general case with this manner of treating `union`s, which _most_ implementations happen to handle safely.) – Lightness Races in Orbit Apr 11 '13 at 15:19
  • @jthill isn't it for C? My question explicitly states C++. – Bartek Banachewicz Apr 11 '13 at 15:20
  • 1
    @jthill: What does casting have to do with it? You could cast a member to a `char` array and examine it, but it would still have to be the last member you set. You can't just arbitrarily pick whichever member you like, even if one of those were a `char` array. `[C++11: 9.5/1]` plainly states: `In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.` The _only_ exception is the prefix rule for standard-layout `struct` members. – Lightness Races in Orbit Apr 11 '13 at 15:20
  • @LightnessRacesinOrbit The value of `e.c` is a char pointer. `e.c[1]` is an access through a char pointer. – jthill Apr 11 '13 at 15:37
  • 1
    @jthill: Right, and unless that member was the _last_ member of the union to which you assigned a value, _you shall not retrieve its value_. It's as simple as that. – Lightness Races in Orbit Apr 11 '13 at 15:42
  • I'm sorry, but that's simply not true. You can copy structs memberwise or you can copy them a byte at a time, either is required to work. – jthill Apr 11 '13 at 15:46
  • @jthill I thought we were speaking about the unions, not the structs. – Bartek Banachewicz Apr 11 '13 at 16:27