Initialize union with nested structs

Question

I am porting C99 code to C++(14 or 17), and in many places the list initializer is used. Now I am getting compilation errors and would like to know the simplest way to initalize a union nested by a struct. For instance, the following snippet in C works just fine:

#include <stdint.h>

typedef union Word_t
{
    uint32_t word32Bits;
    struct
    {
        uint16_t leastSignificant16Bits;
        uint16_t mostSignificant16Bits;
    };
} Word_t;

int main()
{
    Word_t w1 = (Word_t) {.word32Bits = 0x1234ABCD};
    printf("%x\n", w1.word32Bits);

    Word_t w2 = (Word_t) {.mostSignificant16Bits = 0x1234, .leastSignificant16Bits = 0xABCD};
    printf("%x\n", w2.word32Bits);

    return 0;
}

$ gcc test.c --std=c99 -o a && ./a
1234abcd
1234abcd

However, in C++ it does not compile:

#include <stdint.h>

typedef union Word_t
{
    uint32_t word32Bits;
    struct
    {
        uint16_t leastSignificant16Bits;
        uint16_t mostSignificant16Bits;
    } _word16Bits;
} Word_t;

int main()
{
    Word_t w1 = (Word_t) {.word32Bits = 0x1234ABCD};
    printf("%x\n", w1.word32Bits);

    Word_t w2 = (Word_t) {.mostSignificant16Bits = 0x1234, .leastSignificant16Bits = 0xABCD};
    printf("%x\n", w2.word32Bits);

    return 0;
}


```bash
$ g++ test.c --std=c++14 -o a && ./a
test.c: In function ‘int main()’:
test.c:57:92: error: ‘Word_t’ has no non-static data member named ‘mostSignificant16Bits’
     Word_t w2 = (Word_t) {.mostSignificant16Bits = 0x1234, .leastSignificant16Bits = 0xABCD};

The work-araound solution I found is to zero-initalize and then set the internal values of the struct as following:


int main()
{
    Word_t w1 = (Word_t) {.word32Bits = 0x1234ABCD};
    printf("%x\n", w1.word32Bits);

    Word_t w2 = (Word_t) {.mostSignificant16Bits = 0x1234, .leastSignificant16Bits = 0xABCD};


    Word_t w2 = {0};
    w2._word16Bits = {0x1234, 0xABCD};

    return 0;
}

Which works, but it doesn't allow me to explicitly say .mostSignificant16Bits = 0x1234 for instance -- which I think is kind of useful, specially when reading the code.

I tried couple of things like defining static members, creating a user-defined constructor, but still no idea how to simplify the refactor that I am going to do. Ideally, I would like to leave the variable declaration as how it is Word_t w2 = (Word_t) {.mostSignificant16Bits = 0x1234, .leastSignificant16Bits = 0xABCD} while all the changes are done in the definition of Word_t.

C and C++ are two very different languages. Unions is one area where they differ a lot. — Some programmer dude, Nov 06 '19 at 17:33
C and C++ are different languages. It's useful to get into the habit that what works in one is not guaranteed to work in the other. This only gets more true and time goes on and the languages diverge even more. Pretend you are working with Python and Java instead and attack the problem from the different languages idioms. — NathanOliver, Nov 06 '19 at 17:37

score 3 · Accepted Answer · edited Nov 06 '19 at 20:48

Syntactic issues

Designated initializers in aggregate initialization are formally part of the C++20 standard.

They come however with serious constraints compared to C99:

They have to appear in the same order than the declaration;
All the designated elements must be direct members of the aggregate;
Nesting is possible only if initializer are nested

In your case, the following would compile, but it would fail to provide the flexibility benefits that you expect from the explicit naming :

Word_t w2 = (Word_t) {._word16Bits  { .leastSignificant16Bits = 0xABCD, .mostSignificant16Bits = 0x1234} };

More serious issues

First, this code, if it would work, is not portable: it assumes little endianness of the target architecture.

Second, and this is crucial here, C++ has strong constraints on unions. These are necessary in view of the consistency of the object lifecycle. In particular:

[class.union]/1: In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

So if you construct the union with one member active (the one used in your initializer), the other member is not active and you should not access it. The only exception foreseen is not applicable to your case:

[ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members; — end note ]

The standard also give hints about the way to change the active member of the union:

[ Note: In general, one must use explicit destructor calls and placement new operators to change the active member of a union. — end note ]

This being said for simple scalar types, it may compile and work as you expect on most of the mainstream compilers.

But the point is, that the usage you make of unions is incompatible with the standard. It's UB, and even if it works on some implementations now, at every new compiler release, you'll have no guarantee that it will continue to work, putting all your investments at risks.

Why it is time to change approach ?

The C99 has less constraints on unions as C++. But it gives no firm guarantee either about the value that might be read in one union member when another union member was set:

6.2.6.1/7: When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

It is mentioned in annex C99/J that this usage of unions is an unspecified behavior that can create portability issues.

Thanks for the detailed explanation! – Berthin Nov 07 '19 at 08:31 — Berthin, Nov 07 '19 at 08:31

Bodo · Answer 2 · 2019-11-07T08:43:47.237

This works in C and C++:

#include <stdio.h>
#include <stdint.h>

typedef union Word_t
{
    uint32_t word32Bits;
    struct
    {
        uint16_t leastSignificant16Bits;
        uint16_t mostSignificant16Bits;
    } _word16Bits;
} Word_t;

int main()
{
    Word_t w1 = {.word32Bits = 0x1234ABCD};
    printf("%x\n", w1.word32Bits);

    Word_t w2 = {._word16Bits={.leastSignificant16Bits = 0xABCD, .mostSignificant16Bits = 0x1234}};
    printf("%x\n", w2.word32Bits);

    return 0;
}

Edit: --std=c++2a instead of --std=c++14 is needed for designated initializers.

$ g++ -Wall -Wextra test.c --std=c++2a -pedantic -pedantic-errors -o a && ./a
1234abcd
1234abcd
$ gcc -Wall -Wextra test.c -pedantic -pedantic-errors -o a && ./a
1234abcd
1234abcd

Note: In C++ you have to specify the tagged initializers in the same order as in the structure declaration.

As mentioned in Ted Lyngmo's comment, writing to one union member and reading from another is undefined behavior in C++.

In any case using a union to extract shorter parts from a longer data type or combine them is implementation-dependent. On a big-endian system, the order of leastSignificant16Bits and mostSignificant16Bits would be reversed.

Writing to one union member and reading from another is UB in C++. Also, `--std=c++14` can't be right? It looks like C++20. Edit: https://godbolt.org/z/hgfQjt — Ted Lyngmo, Nov 06 '19 at 18:18
@bodo Thanks for pointing out the initialization order! It compiles with no errors in my pc when I tested :). I am accepting Christophe's answer since he elaborates a little bit more though. — Berthin, Nov 07 '19 at 08:29

Initialize union with nested structs

2 Answers2

Syntactic issues

More serious issues

Why it is time to change approach ?