4

Won't the union in this question cause UB when used as this:

union Data
{
    unsigned int intValue;
    unsigned char argbBytes[4];
};
Data data;
data.intValue = 1235347;
unsigned char alpha = data.argbBytes[0]; //UB?

I'm thinking about 9.5/1 in the standard:

In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time.

Community
  • 1
  • 1
Andreas Brinck
  • 51,293
  • 14
  • 84
  • 114

3 Answers3

3

I suppose that'd be undefined as what you've done is platform specific. alpha would end up as a different value depending on whether your platform is big-endian or little-endian.

But, the technique you show is pretty much equivalent to doing a reinterpret_cast.

I think the standard is pointing out that you can't store different values in both members (as they overlap in memory).

The real reason union's were invented were to allow people to fit more data in a smaller amount of memory. Traditionally, along with a union, you'd save some marker (perhaps a bit or two stored in a bitmask) outside of the union to remember which member of the union is active. Using this marker, you'd carefully code accesses to the union so you only read the active member.

Scott Langham
  • 58,735
  • 39
  • 131
  • 204
  • The thing is that doing something like `(reinterpret_cast(&data.intValue))[0]` is also UB (5.2.10/7). Even though I'm pretty sure that extracting a byte from the int through the union will work fine in practce I still think it's UB. – Andreas Brinck Aug 13 '10 at 08:29
  • the standard is not pointint out that you can't have both memebers simultaneously. the problem is that in general the value of one member may not be a convertible to a value of another member e.g. you might have some trap bits – jk. Aug 13 '10 at 08:35
3

in general you are right, writing a value of one type to a union then reading it out as a different type is undefined behaviour. on the other hand iirc the standard explicitly allows anything to castable as a char array. it's never been 100% clear to me which takes precedence, but all implementations I have ever used allow union casting to do what you want anyway.

jk.
  • 13,817
  • 5
  • 37
  • 50
  • This union trick doesn't qualify as "casting" though, so the fact that everything can be converted to `char[]` is irrelevant. According to the standard, using a union like this is UB, but it is so common that compilers tend to explicitly guarantee that it'll work. – jalf Aug 13 '10 at 11:00
  • 2
    @jalf the Standard never explicitly says that using an union like this is UB. So to me, this just looks like an alias case, and aliasing any object using `unsigned char` is explicitly allowed. – Johannes Schaub - litb Aug 15 '10 at 13:35
  • @Johannes: hmm, fair point. I don't suppose you've got a reference or two handy regarding aliasing to `char` being allowed? – jalf Aug 15 '10 at 15:23
  • 1
    @jalf it's at `3.10/15` last bullet. C99 has an explicit footnote (non-normative) for this at `6.5.2/3`, which reads 'If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning").' On the normative side, my impression is that both C++ and C have the same specification. – Johannes Schaub - litb Aug 15 '10 at 15:32
0

It is not clear from the post as to what is the size of 'int' on the platform. Assuming 32-bit integer, and 8-bit character, i.e. sizeof(int) == 4.

It is also not clear what is the endian-ness of the machine. Let us assume small endian.

With that undertanding, 0x12D993 (decimal 1235347) would be stored as

0x93 0xd9 0x12 0x00 (increasing address)

When this memory is accessed through 'argbBytes', the value of argbBytes[0] really depends on the endian-ness of the machine. Therefore, it is not an Undefined Behavior but an Implementation Defined Behavior.

Chubsdad
  • 24,777
  • 4
  • 73
  • 129
  • 1
    It doesn't matter what the platform is. The standard says that this is undefined behavior, so it is UB. A specific compiler may guarantee that this works, but that is outside the scope of the standard, and can't be relied on to work on every compiler/platform. – KeithB Aug 13 '10 at 13:31
  • @KeithB: Please give the reference to the verse in the Standard. – Chubsdad Aug 15 '10 at 09:48
  • @chubsdad: wrong way around. It is undefined behavior if you can't point to a part of the standard which says it isn't. UB is the default. ;) – jalf Aug 15 '10 at 15:26