Type punning - how does the compiler decide what type to use?

Question

I was reading this question here about deciding endianness and the first answer baffled me somewhat.

The code used to decide big endianness is as follows:

int is_big_endian(void)
{
    union {
        uint32_t i;
        char c[4];
    } bint = {0x01020304};

    return bint.c[0] == 1; 
}

My question is how does the compiler here decide what type to use for that array of hex digits? Because technically it fits equally well in both that uint32_t or that char[4].

Why not just store it in the char[4] and skip the union?

Is there some advantage of a union here that I don't see? I know this is called type-punning, but I fail to see its advantage here.

You mean `char *c = (char *)&i;`? That's fine too, because `char` can alias any object. — , Aug 22 '13 at 14:16
Why not just store int in a char array? Because otherwise, you wouldn't be able to check the endianness. If you used a char array directly, then c[0] will always be 1, no matter which endianness is used. — mfontanini, Aug 22 '13 at 14:16
In little endian, c[0] is 0x04 and in big endian c[0] is 0x01. That is why the union is used. You could also cast i to (char&) and peek. — Neil Kirk, Aug 22 '13 at 14:17

score 6 · Accepted Answer · answered Aug 22 '13 at 14:18

6

My question is how does the compiler here decide what type to use for that array of hex digits?

As with arrays and aggregate classes, the first initialiser initialises the first member; in this case i. (Of course, unlike those things, it doesn't make sense to have more than one initialiser).

Why not just store it in the char[4] and skip the union? Is there some advantage of a union here that I don't see?

The purpose of this is to initialise the 4-byte integer, then use the char array to examine the individual bytes to determine the memory order. If the most significant byte (0x01) is stored in the first byte, then the system is "big-endian"; otherwise it's "little-endian" (or perhaps something stranger).

answered Aug 22 '13 at 14:18

Mike Seymour

249,747
28
448
644

1

+1 this, but `char *c = (char *)&i;` is *still* correct. Maybe OP is looking for something like "the author of the code decided that a union is more elegant than casting pointers"? – Aug 22 '13 at 14:24
@H2CO3 I really just trying to understand the logic behind the way it was done in that answer. (ie with the union) – Tony The Lion Aug 22 '13 at 14:26
@H2CO3: Yes, there are alternative ways to get type punning; but that's beyond the scope of a question that just asks what the `union` was for. – Mike Seymour Aug 22 '13 at 14:27
@TonyTheLion As I said, I think that's because the author liked the union better. Or perhaps he wanted to stay safe no matter what - if you change `char` (or `unsigned char`) to anything else, the code pointer-casting approach will violate the strict aliasing rule, but the union-based solution will work in C99. It is, however, still UB in C++. – Aug 22 '13 at 14:28

score 2 · Answer 2 · answered Aug 22 '13 at 14:18

2

The original C standard allowed only to assign a value to the first element of an union. This means: 0x1020304 is assigned to "i", not to "c".

The latest C standard allows assigning to any member like this:

union { ... } bint = { .c = {1,2,3,4} };
union { ... } bint2 = { .i = 0x1020304 };

However - as said - if no name is given then the value is assigned to "i".

answered Aug 22 '13 at 14:18

Martin Rosenau

17,897
3
19
38

Nitpick: the C standard one before allows named initialization of unions too :) – Aug 22 '13 at 14:19
2

Although the C standard has little to do with a question about C++. – Mike Seymour Aug 22 '13 at 14:21
@MikeSeymour Although raw character arrays have little to nothing to do with C++. This question smells like a mistagged one, right? – Aug 22 '13 at 14:22
@H2CO3: What do you mean, "arrays have little to do with C++"? They're exactly what you use if you're dealing with raw memory, as this does. Or are you saying that C++ should be used exclusively for high-level programming? I certainly don't agree with that. – Mike Seymour Aug 22 '13 at 14:25
@MikeSeymour That's what smart C++ programmers on SO say. I'm not C++-savvy enough to disagree with them. – Aug 22 '13 at 14:26
@H2CO3: people frequently use C constructs in C++ because of the compatibility, so... the OP may not even know could be used as pure C. – Matthieu M. Aug 22 '13 at 14:27
@MatthieuM. Yeah, reasonable. I have been talked off badly for writing C in C++, though. (Not without a reason, I must admit. I feel that some extremely C++-minded people are unhappy that `main()`'s second argument is `char **` and not `vector`... >.<) – Aug 22 '13 at 14:29
@H2CO3: Actually, with `vector` you would not need two arguments size the length is part of the vector... but this causes memory allocations... however in C++14 I could perfectly see a `int main(std::initializer_list args)` signature :D – Matthieu M. Aug 22 '13 at 14:38
@MatthieuM. After all, [Torvalds was right](http://article.gmane.org/gmane.comp.version-control.git/57918). – Aug 22 '13 at 14:41
3

@H2CO3 there is an important fact that should be recognised here: the character arrays are not being used for storage. They are used as a cheat to inspect the implementation-defined representation of another type (which is fine, since the whole point of the exercise is to determine characteristics of the implementation). Unlike when using them for storage, a use case for which C++ provides superior alternatives (let's assume they are superior for the sake of argument), C++ provides no other tools for inspecting the representation of an object. – R. Martinho Fernandes Aug 22 '13 at 14:47
@R.MartinhoFernandes So, [this](http://stackoverflow.com/questions/18382926/type-punning-how-does-the-compiler-decide-what-type-to-use/18383065?noredirect=1#comment26995709_18383079) is wrong? (Or what are you referring to? I'm not suggesting that the union solution is bad or inferior.) – Aug 22 '13 at 14:49
@H2CO3 no, that's fine. It's essentially the same thing, but with a loss less code involved and without possibility for tiresome debates about legality. It's what I would use myself, though with an explicit `reinterpret_cast`. What I'm saying is this particular use of character arrays/pointers is "acceptable" in C++ because it is *the* tool that C++ provides for the use case. – R. Martinho Fernandes Aug 22 '13 at 14:54
@R.MartinhoFernandes Yeah, me too, probably. Then we agree :) – Aug 22 '13 at 14:54

Paul Evans · Answer 3 · 2013-08-22T15:02:18.290

0

Because you want to store 0x01020304 as the unsigned 32-bit integer uint32_t i and then read the first byte char c[0].

edited Aug 22 '13 at 15:02

answered Aug 22 '13 at 14:16

Paul Evans

27,315
3
37
54

Type punning - how does the compiler decide what type to use?

3 Answers3