sizeof a union in C/C++

Question

What is the sizeof the union in C/C++? Is it the sizeof the largest datatype inside it? If so, how does the compiler calculate how to move the stack pointer if one of the smaller datatype of the union is active?

score 69 · Answer 1 · edited May 23 '17 at 12:02

69

A union always takes up as much space as the largest member. It doesn't matter what is currently in use.

union {
  short x;
  int y;
  long long z;
}

An instance of the above union will always take at least a long long for storage.

Side note: As noted by Stefano, the actual space any type (union, struct, class) will take does depend on other issues such as alignment by the compiler. I didn't go through this for simplicity as I just wanted to tell that a union takes the biggest item into account. It's important to know that the actual size does depend on alignment.

edited May 23 '17 at 12:02

Community

1
1

answered Apr 11 '09 at 18:28

Mehrdad Afshari

414,610
91
852
789

One place where sizeof might return something larger is when a long double is being used. A long double is 10-bytes but Intel recommends aligning at 16-bytes. – dreamlax Apr 15 '09 at 23:14
long double is ... well, depends on the compiler. I *think* PowerPC compilers use 128bit long doubles – Mehrdad Afshari Apr 15 '09 at 23:15
Yeah whoops I meant to say a long double on x86. – dreamlax Apr 16 '09 at 01:08

score 36 · Accepted Answer · edited Oct 17 '19 at 18:07

The Standard answers all questions in section 9.5 of the C++ standard, or section 6.5.2.3 paragraph 5 of the C99 standard (or paragraph 6 of the C11 standard, or section 6.7.2.1 paragraph 16 of the C18 standard):

In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time. [Note: one special guarantee is made in order to simplify the use of unions: If a POD-union contains several POD-structs that share a common initial sequence (9.2), and if an object of this POD-union type contains one of the POD-structs, it is permitted to inspect the common initial sequence of any of POD-struct members; see 9.2. ] The size of a union is sufficient to contain the largest of its data members. Each data member is allocated as if it were the sole member of a struct.

That means each member share the same memory region. There is at most one member active, but you can't find out which one. You will have to store that information about the currently active member yourself somewhere else. Storing such a flag in addition to the union (for example having a struct with an integer as the type-flag and an union as the data-store) will give you a so called "discriminated union": An union which knows what type in it is currently the "active one".

One common use is in lexers, where you can have different tokens, but depending on the token, you have different informations to store (putting line into each struct to show what a common initial sequence is):

struct tokeni {
    int token; /* type tag */
    union {
        struct { int line; } noVal;
        struct { int line; int val; } intVal;
        struct { int line; struct string val; } stringVal;
    } data;
};

The Standard allows you to access line of each member, because that's the common initial sequence of each one.

There exist compiler extensions that allow accessing all members disregarding which one currently has its value stored. That allows efficient reinterpretation of stored bits with different types among each of the members. For example, the following may be used to dissect a float variable into 2 unsigned shorts:

union float_cast { unsigned short s[2]; float f; };

That can come quite handy when writing low-level code. If the compiler does not support that extension, but you do it anyway, you write code whose results are not defined. So be certain your compiler has support for it if you use that trick.

A horrible example of bad Standard language IMHO - in fact the whole section on unions seems a bit skimped. Why introduce the concept of "active" at all? — , Apr 11 '09 at 19:59
GCC at least explicitly supports cross reading of union members. and if the members are somehow related according to 3.10/15 or are layout-compatible, i'm pretty sure you can still read the other member even tho it's not the "active" one. — Johannes Schaub - litb, Apr 11 '09 at 20:15
It's the "active" bit that gets me. if 9.5\1 were to start with "the value of at most" then there would be no need to introduce this nebulous concept of "active". But this should (if anywhere) be on comp.lang.c++.std, and not in a godawful SO comment box! So I'm signing off on this topic. — , Apr 11 '09 at 20:57
haha, alright. start off a thread there and let's have fun :p — Johannes Schaub - litb, Apr 11 '09 at 21:18
@anon "active" is essential here because the type of the data in a union is dependent on what was last stored in it. Just because you don't understand the standard doesn't mean it's wrong. — Jim Balter, May 02 '14 at 10:51

Stefano Borini · Answer 3 · 2009-04-11T19:47:38.257

It depends on the compiler, and on the options.

int main() {
  union {
    char all[13];
    int foo;
  } record;

printf("%d\n",sizeof(record.all));
printf("%d\n",sizeof(record.foo));
printf("%d\n",sizeof(record));

}

This outputs:

13 4 16

If I remember correctly, it depends on the alignment that the compiler puts into the allocated space. So, unless you use some special option, the compiler will put padding into your union space.

edit: with gcc you need to use a pragma directive

int main() {
#pragma pack(push, 1)
      union {
           char all[13];
           int foo;
      } record;
#pragma pack(pop)

      printf("%d\n",sizeof(record.all));
      printf("%d\n",sizeof(record.foo));
      printf("%d\n",sizeof(record));

}

this outputs

13 4 13

You can also see it from the disassemble (removed some printf, for clarity)

  0x00001fd2 <main+0>:    push   %ebp             |  0x00001fd2 <main+0>:    push   %ebp
  0x00001fd3 <main+1>:    mov    %esp,%ebp        |  0x00001fd3 <main+1>:    mov    %esp,%ebp
  0x00001fd5 <main+3>:    push   %ebx             |  0x00001fd5 <main+3>:    push   %ebx
  0x00001fd6 <main+4>:    sub    $0x24,%esp       |  0x00001fd6 <main+4>:    sub    $0x24,%esp
  0x00001fd9 <main+7>:    call   0x1fde <main+12> |  0x00001fd9 <main+7>:    call   0x1fde <main+12>
  0x00001fde <main+12>:   pop    %ebx             |  0x00001fde <main+12>:   pop    %ebx
  0x00001fdf <main+13>:   movl   $0xd,0x4(%esp)   |  0x00001fdf <main+13>:   movl   $0x10,0x4(%esp)                                         
  0x00001fe7 <main+21>:   lea    0x1d(%ebx),%eax  |  0x00001fe7 <main+21>:   lea    0x1d(%ebx),%eax
  0x00001fed <main+27>:   mov    %eax,(%esp)      |  0x00001fed <main+27>:   mov    %eax,(%esp)
  0x00001ff0 <main+30>:   call  0x3005 <printf>   |  0x00001ff0 <main+30>:   call   0x3005 <printf>
  0x00001ff5 <main+35>:   add    $0x24,%esp       |  0x00001ff5 <main+35>:   add    $0x24,%esp
  0x00001ff8 <main+38>:   pop    %ebx             |  0x00001ff8 <main+38>:   pop    %ebx
  0x00001ff9 <main+39>:   leave                   |  0x00001ff9 <main+39>:   leave
  0x00001ffa <main+40>:   ret                     |  0x00001ffa <main+40>:   ret

Where the only difference is in main+13, where the compiler allocates on the stack 0xd instead of 0x10

Yes, I suppose we should all have said "At _least_ as big as the largest contained type". — , Apr 11 '09 at 19:30
@Neil: compiler alignment is a totally different issue. It happens in structs too and also depends on the place you put the union in the struct. While this is certainly true, I think it just complicates the answer to *this* question. btw, I was careful to align my sample union to 8byte boundary :-p — Mehrdad Afshari, Apr 11 '09 at 20:01

score 12 · Answer 4 · answered Apr 11 '09 at 18:33

12

There is no notion of active datatype for a union. You are free to read and write any 'member' of the union: this is up to you to interpret what you get.

Therefore, the sizeof a union is always the sizeof its largest datatype.

answered Apr 11 '09 at 18:33

mouviciel

66,855
13
106
140

3

You are of course wrong ... the language of the standard explicitly refers to the active datatype. However, sizeof is a compile-time operation and so of course does not depend on the active datatype. – Jim Balter May 02 '14 at 10:53
2

@JimBalter - You are correct about the standard. What I mean is that in C you cant't query a union about its _active datatype_. Nothing prevents the coder from writing a float and reading an int (and getting garbage). – mouviciel May 02 '14 at 16:04
4

You said "There is no notion of active datatype for a union". You were wrong; own it. It won't do to claim that you meant something very different from what you wrote just to try to avoid having been wrong. " Nothing prevents the coder from writing a float and reading an int (and getting garbage)." -- Of course nothing prevents it ... the C Standard doesn't *prevent* anything; it only tells you whether that behavior is defined -- it isn't. As has been noted repeatedly, UB includes anything, even nuclear weapon detonation. For some people, that prevents them from coding UB. – Jim Balter May 02 '14 at 20:14
-1 for not stipulating whether you're talking about C or C++, which differ fundamentally in this regard of union type-punning. Reinterpretation of object byte representation is allowed in C http://stackoverflow.com/q/11639947/2757035 but **not** in C++. In the latter, it's pure UB, or implementation-defined if you'll settle for that (in `g++`, for instance, they use the C rules). – underscore_d Jun 14 '16 at 13:30

score 3 · Answer 5 · 2009-04-11T19:31:00.853

3

The size will be at least that of the largest composing type. There is no concept of an "active" type.

edited Apr 11 '09 at 19:31

answered Apr 11 '09 at 18:33

1

Except that yes, there is. – underscore_d Jun 14 '16 at 13:30

score 2 · Answer 6 · answered Apr 11 '09 at 18:49

You should really look at a union as a container for the largest datatype inside it combined with a shortcut for a cast. When you use one of the smaller members, the unused space is still there, but it simply stays unused.

You often see this used in combination with ioctl() calls under in Unix, all ioctl() calls will pass the same struct, which contains a union of all possible responses. E.g. this example comes from /usr/include/linux/if.h and this struct is used in ioctl()'s for configuring/querying the state of an ethernet interface, the request parameters defines which part of the union is actually in use:

struct ifreq 
{
#define IFHWADDRLEN 6
    union
    {
        char    ifrn_name[IFNAMSIZ];        /* if name, e.g. "en0" */
    } ifr_ifrn;

    union {
        struct  sockaddr ifru_addr;
        struct  sockaddr ifru_dstaddr;
        struct  sockaddr ifru_broadaddr;
        struct  sockaddr ifru_netmask;
        struct  sockaddr ifru_hwaddr;
        short   ifru_flags;
        int ifru_ivalue;
        int ifru_mtu;
        struct  ifmap ifru_map;
        char    ifru_slave[IFNAMSIZ];   /* Just fits the size */
        char    ifru_newname[IFNAMSIZ];
        void *  ifru_data;
        struct  if_settings ifru_settings;
    } ifr_ifru;
};

score 1 · Answer 7 · answered Sep 12 '16 at 12:23

What is the sizeof the union in C/C++? Is it the sizeof the largest datatype inside it?

Yes, The size of the union is the size of its biggest member.

For Example :

#include<stdio.h>

union un
{
    char c;
    int i;
    float f;
    double d;
};

int main()
{
    union un u1;
    printf("sizeof union u1 : %ld\n",sizeof(u1));
    return 0;
}

Output :

sizeof union u1 : 8
sizeof double d : 8

Here biggest member is double. Both have size 8. So, as sizeof correctly told you, the size of the union is indeed 8.

how does the compiler calculate how to move the stack pointer if one of the smaller datatype of the union is active?

It internally handles by the compiler. Suppose we are accessing one of the data member of union then we cannot access other data member since we can access single data member of union because each data member shares same memory. By Using Union we can Save Lot of Valuable Space.

score 0 · Answer 8 · edited Apr 16 '09 at 00:02

0

The size of the largest member.
This is why unions usually make sense inside a struct that has a flag that indicates which is the "active" member.

Example:

struct ONE_OF_MANY {
    enum FLAG { FLAG_SHORT, FLAG_INT, FLAG_LONG_LONG } flag;
    union { short x; int y; long long z; };
};

edited Apr 16 '09 at 00:02

Mehrdad Afshari

414,610
91
852
789

answered Apr 11 '09 at 19:16

isekaijin

19,076
18
85
153

1

Not true. A common use is to access smaller parts of a larger type. Example: union U { int i; char c[4]; }; can be used to give (implementation specific) access to bytes of a 4-byte integer. – Apr 11 '09 at 19:39
Oh, true... I haven't noticed that possibility. I have always accessed parts of a larger type using byte shifts and that kind of things. – isekaijin Apr 12 '09 at 00:54
@anon - implementation specific or simply UB, depending on your compiler. Relying even on the former is bad practice if it can be avoided. – underscore_d Jun 14 '16 at 13:31

sizeof a union in C/C++

8 Answers8

Linked

Related