Using UNIONs and STRUCTUREs

Question

Is this approach correct?

struct netinfo
{
   // a lot of members here...
   union
   {
     UINT type;
     struct
     {
        UINT   _type;    // a place holder (obviously who access this struct already knows the type is ipv4)
        UINT32 saddr;
        UINT32 daddr;
        UINT   ttl.
     } ip4;
     struct
     {
       UINT  _type;        // a place holder (obviously who access this struct already knows the type is ipv6)
       UINT8 saddr[16];
       UINT8 daddr[16];
     } ip6;
  }  protocol;
  // a lot of members here...
};

struct netinfo my_netinfo;

my_netinfo.protocol.type = NETTYPE__IPV4; // note that i used the "protocol" union as the "type"  member
my_netinfo.protocol.ip4.saddr = addr1;     // now i'm accessing the union as the  "ip4" member
my_netinfo.protocol.ip4.daddr = addr2;
my_netinfo.protocol.ip4.ttl = 64;

// ...

// later, i pretend to acess protocol.type and protocol.ip4.X, the way i want

Note that my intention is access it as, AT THE SAME TIME:

my_netinfo.protocol.type and my_netinfo.protocol.ip4

I will not use my_netinfo.protocol.ip4.type, because by accessing my_netinfo.protocol.ip4, I already know i'm dealing with IPv4.

The question is:

By using the _type in the structures, as its first members, will this approach be valid and a good idea?

union { int x; struct { int _padding_for_x; int a; int b; } y };

This way union_namex, union_name.y.a, union_namey.y.b, union_name.x again...

Can I write/read/write/read again all of them in any order?

A union lets you access one area of memory in two different ways. An example of this that is semi useful is to union a 4byte int with 4 single byte characters, to let you read each byte of the int. Your structs are all of different size, so writing to one part of the union would then make reading from a different part of it give corrupt data. I'm not sure what you're really looking for with your code, but I'm pretty sure a union isn't it. — Scott Mermelstein, Jun 22 '14 at 07:05
I see no reason why you should use type declaration in the inner structures. You might consider changing the outer union to struct with a `type` and `header` fields (the first being a tag type and the second a union). — Eran, Jun 22 '14 at 07:06
I agree with @Eran, you can add a `UINT type` to `netinfo` and remove the redundancy of `protocol.type`, `protocol.ip4._type`, and `protocol.ip6._type`. — jmstoker, Jun 22 '14 at 07:20
That's because I have the physical layer (ethernet), the addressing (ipv4, ipv6...) and the transport layer (tcp, udp...) And so I have phys_type, addr_type, transp_type... and so i was looking for a better way to store them. But you are all right. So i will just move those XYZ_type's and use as struct { uint phys_type; union { struct ethernet; } phys; uint addr_type; union { struct ip4; struct ip6; } addr; uint transp_type; union { struct tcp4; struct udp4; struct tcp6; struct udp6; } transp; }; — user3525723, Jun 22 '14 at 19:47

score 2 · Answer 1 · answered Jun 22 '14 at 08:10

In addition to the comments below your question about removing redundant UINT type, it may make more sense to use two different structs for ip4 and ip6 rather then attempting to make them members of the netinfo union. Here is why. While a union does not require the members to be of equal size, the union will only make use of memory large enough to hold the largest member size. See: glibc reference manual That is due to the union only holding one set of data at a time. According to your question, that may not work:

Note that my intention is access it as, AT THE SAME TIME:

You can access both netinfo.ip4 and netinfo.ip6 AT THE SAME TIME, but unless your code will behave correctly for both ip4 and ip6 regardless of which was used to fill the union immediately before the read, your code will not work. Recall, you can access the same data through both datatype contained in the union, but there is no guarantee what you access will make sense for both types if the data is not the same for both types. That leads to your final questions:

Can I write/read/write/read again all of them in any order?

You can write to the union any time you want, but you will only get useful data back though both types at the same time if and only if, the data stored in the union is valid for both ip4 and ip6 at the same time. Which given the storage differences for the members of ip4 and ip6 is not likely.

So you can store ip4 and readip4 back, you can store ip6 and read ip6 back, but you cannot store ip4.saddr and then expect to read ip6.saddr back without problems. Separate structs outside of a union will be much less problematic.

score 1 · Accepted Answer · answered Jun 22 '14 at 10:22

The answer is no and no.

The standard requires that only one of the members of a union be active at a time. There is an exemption that would allow you to inspect the common initial sequence (UINT _type) but the other members cannot be active at the same time. A program that does so is ill-formed (which is just as bad as Undefined Behaviour).

Not only is it theoretically bad, but practically bad as well. The alignment possibilities make it almost impossible to predict the layout of that union. Your data is in there somewhere, but only the compiler and its maker know where.

So no, it isn't a good idea either.

Although everyone have correctly answered, you got the point of my question - the use of a common member (_type) as the first one in the structure, to make it possible to read/write this UINT as union.type or union.structure._type, and access the same thing. But, as I expected, it's really a bad idea. (it WILL work, but that is just a fact, it won't be following the standards). — user3525723, Jun 22 '14 at 19:39

score 0 · Answer 3 · edited May 23 '17 at 12:27

Short answer

Like I propose in the comment to the question: use a union which contains separate structure tags. Using union, you'll make better use of space, and using separate structure tags will avoid pointer aliasing. This is theoretically sound and actually a common practice.

Longer answer

Since you present a network code written in C, I assume you aim for performance. So you don't want to waste extra space and time on copying buffers. That's the motivation for using a union, which is only as large as the largest type it contains, and allows you to treat a memory location as a sum type (either ipv4 OR ipv6). There are some pitfalls you should watch for when optimizing your low level data shuffling.

Pointer aliasing occurs when you access the same memory location from different pointer types. In your case, you may access a network message buffer from a pointer to an IPv4 message or a pointer to an IPv6 message. This is naughty, because when compiler optimization and multi-threading come into play, you get decreased performance and/or very strange bugs.

I recommend that you read this SO community wiki for a good explanation of the strict aliasing rule. It also warns about other things you should think about, such as endianess, alignment and packing.

Moving on. Luckily, pointers to union types with differing tags do not alias. That's why using a union with differing tags is sound. That been said, trying to read IPv4 data from a memory location that holds IPv6 is still a stupid idea that the compiler won't help you avoid. So don't do it.

This code should work (full disclosure - I didn't try to compile it):

struct netinfo
{
   // a lot of members here...
   struct
   {
     UINT type;
     union 
     {
         struct
         {
            UINT32 saddr;
            UINT32 daddr;
            UINT   ttl.
         } ip4;
         struct
         {
           UINT8 saddr[16];
           UINT8 daddr[16];
         } ip6;
     } data;
  }  protocol;
  // a lot of members here...
};

Using UNIONs and STRUCTUREs

3 Answers3

Short answer

Longer answer

Other links you may find useful