Structures and Unions in C, determining size and accessing members

Question

All,

Here is an example on Unions which I find confusing.

struct s1
{
    int a;
    char b;
    union
    {
       struct
       {
          char *c;
          long d;
       }
       long e;
     }var;
};

Considering that char is 1 byte, int is 2 bytes and long is 4 bytes. What would be the size of the entire struct here ? Will the union size be {size of char*}+ {size of double} ? I am confused because of the struct wrapped in the union.

Also, how can I access the variable d in the struct. var.d ?

You can't know for sure, because the compiler can pad these constructs for better/proper data alignment. — Joe, Jul 31 '10 at 23:07
But padding is an option the programmer needs to mention right? What would be the memory layout if the code is compiled as it is above — name_masked, Jul 31 '10 at 23:09
No, the compiler can (and often will) insert padding by default. — Matthew Flaschen, Jul 31 '10 at 23:10
According to standards, padding is implementation defined. Some (most in my experience) provide a means to control it, but the standard does not require more than that the implementer document what padding will be used. And that the offset of the first element of a struct is 0, meaning that a pointer to an instance of a struct can be cast to a pointer to its first element (and vice versa) safely. — RBerteig, Jul 31 '10 at 23:12
"Considering that char is 1 byte" => char is always no matter implementation 1 byte — , Jul 31 '10 at 23:17
Curious - the code I see won't compile; the `union` needs a name and a semi-colon after the `struct { ... }` and before the `long`. And the name you provide there is needed to access `d`; that will be `var.your_chosen_name.d`, of course. — Jonathan Leffler, Jul 31 '10 at 23:29
Also: are you compiling with 32-bit pointers or 64-bit pointers? And what are the alignment requirements of the system on pointers (8-byte aligned would be normal for 64-bit; 4-byte aligned for 32-bit). There again, if `int` is 2 bytes, you are unlikely to be using a 64-bit machine - you might even be using a 16-bit machine with 16-bit pointers, but that's a little unlikely. AFAICS, `sizeof(double)` does not factor into the calculations; there are no `double` members in the structure shown. — Jonathan Leffler, Jul 31 '10 at 23:33

Matthew Flaschen · Answer 1 · 2010-07-31T23:21:42.287

The sizes are implementation-defined, because of padding. A union will be at least the size of the largest member, while a struct will be at least the sum of the members' sizes. The inner struct will be at least sizeof(char *) to sizeof(long), so the union will be at least that big. The outer struct will be at least sizeof(int) + 1 + sizeof(char *) + sizeof(long). All of the structs and unions can have padding.

You are using an extension to the standard, unnamed fields. In ISO C, there would be no way to access the inner struct. But in GCC (and I believe MSVC), you can do var.d.

Also, you're missing the semi-colon after the inner struct.

score 1 · Accepted Answer · edited May 23 '17 at 10:24

With no padding, and assuming sizeof(int)==sizeof(char *)==sizeof(long)==4, the size of the outer struct will be 13.

Breaking it down, the union var overlaps an anonymous struct with a single long. That inner struct is larger (a pointer and a long) so its size controls the size of the union, making the union consume 8 bytes. The other members are 4 bytes and 1 byte, so the total is 13.

In any sensible implementation with the size assumptions I made above, this struct will be padded to either 2 byte or 4 byte boundaries, adding at least 1 or 3 additional bytes to the size.

Edit: In general, since the sizes of all of the member types are themselves implementation defined, and the padding is implementation defined, you need to refer to the documentation for your implementation and the platform to know for sure.

The implementation is allowed to insert padding after essentially any element of a struct. Sensible implementations use as little padding as required to comply with platform requirements (e.g. RISC processors often require that a value is aligned to the size of that value) or for performance.

If using a struct to map fields to the layout of values assumed by file format specification, a coprocessor in shared memory, a hardware device, or any similar case where the packing and layout actually matter, then you might want to be concerned that you are testing at either compile time or run time that your assumptions of the member layout are true. This can be done by verifying the size of the whole structure, as well as the offsets of its members.

See this question among others for a discussion of compile-time assertion tricks.

score -3 · Answer 3 · answered Aug 01 '10 at 03:05

Unions are dangerous and risky to use without strict discipline. And the fact you put it in a struct is really dangerous because by default all struct members are public: that exposes the possibility of client code making changes to your union, without informing your program what type of data it stuffed in there. If you use a union you should put it in a class where at least you can hide it by making it private.

We had a dev years ago who drank the koolaid of unions, and put it in all his data structures. As a result, the features he wrote with it, are now one of the most despised parts of our entire application, since they are unmodifiable, unfixable and incomprehensible.

Also Unions throw away all the type safety that modern c/c++ compilers give you. Surely if you lie to the compiler it will get back at you someday. Well actually it will get back at your customer when your app crashes.

You might notice that this question is tagged C, but *not* C++. Yes this is a rare case where the OP actually correctly identified the language rather than talking about some mythical C/C++ hybrid. In any case, he is asking about C and classes and protection are not features of the C language. — RBerteig, Aug 04 '10 at 02:05

Structures and Unions in C, determining size and accessing members

3 Answers3

Linked