C++ unions vs. reinterpret_cast

Question

It appears from other StackOverflow questions and reading §9.5.1 of the ISO/IEC draft C++ standard standard that the use of unions to do a literal reinterpret_cast of data is undefined behavior.

Consider the code below. The goal is to take the integer value of 0xffff and literally interpret it as a series of bits in IEEE 754 floating point. (Binary convert shows visually how this is done.)

#include <iostream>
using namespace std;

union unionType {
    int myInt;
    float myFloat;
};

int main() {

    int i = 0xffff;

    unionType u;
    u.myInt = i;

    cout << "size of int    " << sizeof(int) << endl;
    cout << "size of float  " << sizeof(float) << endl;

    cout << "myInt          " << u.myInt << endl;
    cout << "myFloat        " << u.myFloat << endl;

    float theFloat = *reinterpret_cast<float*>(&i);
    cout << "theFloat       " << theFloat << endl;

    return 0;
}

The output of this code, using both GCC and clang compilers is expected.

size of int    4
size of float  4
myInt          65535
myFloat        9.18341e-41
theFloat       9.18341e-41

My question is, does the standard actually preclude the value of myFloat from being deterministic? Is the use of a reinterpret_cast better in any way to perform this type of conversion?

The standard states the following in §9.5.1:

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...] The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.

The last sentence, guaranteeing that all non-static members have the same address, seems to indicate the use of a union is guaranteed to be identical to the use of a reinterpret_cast, but the earlier statement about active data members seems to preclude this guarantee.

So which construct is more correct?

Edit: Using Intel's icpc compiler, the above code produces even more interesting results:

$ icpc union.cpp
$ ./a.out
size of int    4
size of float  4
myInt          65535
myFloat        0
theFloat       0

It's UB. It's no more correct than to say `uint32_t x; *(float*)(&x) = 1.5;`. The correct way to interpret an object as a series of bytes is to consider it as a `char[]`. — Kerrek SB, May 19 '13 at 16:56

score 9 · Accepted Answer · answered May 19 '13 at 16:59

9

The reason it's undefined is because there's no guarantee what exactly the value representations of int and float are. The C++ standard doesn't say that a float is stored as an IEEE 754 single-precision floating point number. What exactly should the standard say about you treating an int object with value 0xffff as a float? It doesn't say anything other than the fact it is undefined.

Practically, however, this is the purpose of reinterpret_cast - to tell the compiler to ignore everything it knows about the types of objects and trust you that this int is actually a float. It's almost always used for machine-specific bit-level jiggery-pokery. The C++ standard just doesn't guarantee you anything once you do it. At that point, it's up to you to understand exactly what your compiler and machine do in this situation.

This is true for both the union and reinterpret_cast approaches. I suggest that reinterpret_cast is "better" for this task, since it makes the intent clearer. However, keeping your code well-defined is always the best approach.

answered May 19 '13 at 16:59

Joseph Mansfield

108,238
20
242
324

Does that mean, that it's "valid" to use undefined behaviour here, although the compiler is also allowed to say "well, this is undefined behaviour, I don't need to make this work"? – Mats Petersson May 19 '13 at 17:02
@MatsPetersson It depends what you mean by "valid". If you want to have the support of the C++ standard in knowing that, with a conformant compiler, your program has exactly one well-defined execution path for some specific input, then you definitely don't want to invoke undefined behaviour. However, if you can guarantee that a program that has undefined behaviour works correctly in any situation it will ever be used, then you might consider that "valid" - it's just not "well-defined" as C++. That's true of any undefined behaviour. – Joseph Mansfield May 19 '13 at 17:05
@MatsPetersson: Quite a bit of “undefined” behaviour is actually implementation-defined because implementors *have* made choices about how to compile the code. So you’re not necessarily relying on things that *shouldn’t* work, just things that can *differ* between compilers and platforms. – Jon Purdy May 19 '13 at 17:06
@JonPurdy Exactly. Undefined behaviour just means the implementation doesn't have to document it. In the case of `reinterpret_cast`, whether it's undefined or not it doesn't do anything special. It's telling the compiler to *not* check whether the cast is safe. Either way, you get a pointer that the compiler considers to be of the new type. – Joseph Mansfield May 19 '13 at 17:11
@sftrabbit Ok, thanks for clarifying that. It seems like a lot of times, people are very quick to point out that something is UB, as if it's to be avoided at all cost, which is why I asked the question. Of course, if it's implementation defined, I guess there is a risk that another version of the same compiler may behave differently, which can throw spanners into the works (and one reason why big projects are very reluctant to change compilers, even if the new compiler is "much better" [by whatever definition of better we may have]). – Mats Petersson May 19 '13 at 21:27
@MatsPetersson Well I do recommend avoiding it as much as possible. You can generally achieve anything you might need without invoking undefined behaviour. It can lead you to all sorts of problems, especially if the compiler performs some optimizations assuming that you don't have undefined behaviour. To be able to use it safely, you'll need to know exactly how your compiler and platform work, so it's can only really be useful for very low-level platform-specific code. – Joseph Mansfield May 19 '13 at 21:34
@sftrabbit Ok, that makes sense. Of course, that tends to be the part of the programming where I've spent a lot of my life... Drivers and operating system code and such - more C and C++, but some of C++ too. – Mats Petersson May 19 '13 at 21:37

score 7 · Answer 2 · answered May 19 '13 at 17:34

It's not undefined behavior. It's implementation defined behavior. The first does mean that bad things can happen. The other means that what will happen has to be defined by the implementation.

The reinterpret_cast violates the strict aliasing rule. So I do not think it will work reliably. The union trick is what people call type-punning and is usually allowed by compilers. The gcc folks document the behavior of the compiler: http://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation

I think this should work with icpc as well (but they do not appear to document how they implemented that). But when I looked the assembly, it looks like icc tries to cheat with float and use higher precision floating point stuff. Passing -fp-model source to the compiler fixed that. With that option, I get the same results as with gcc. I do not think you want to use this flag in general, this is just a test to verify my theory.

So for icpc, I think if you switch your code from int/float to long/double, type-punning will work on icpc as well.

Interestingly, switching to long/double produces the same output as the int/float conversion unless `-fp-model source` is set. — kgraney, May 19 '13 at 17:54
Are you compiling a 64-bit binary? I've tried your problem with long/double with -O3 on the latest iclc and I get the expected result in the union — Guillaume, May 19 '13 at 18:32
Yep, using "icpc (ICC) 13.0.2 20130314" on 64-bit OSX, creating a 64-bit binary. No difference in the output between using -O3 and not using -O3. — kgraney, May 19 '13 at 19:08
Hmm, wfm on linux. But since it's not documented, I am not surprised. I am surprised however that intel does not document some form of type-punning (or I am just bad at searching their doc) — Guillaume, May 20 '13 at 17:47

score 1 · Answer 3 · answered May 19 '13 at 17:04

Undefined behavior does not mean bad things must happen. It means only that the language definition doesn't tell you what happens. This kind of type pun has been part of C and C++ programming since time immemorial (i.e., since 1969); it would take a particularly perverse implementor to write a compiler where this didn't work.

C++ unions vs. reinterpret_cast

3 Answers3

Linked

Related