4

Basically, I have a

struct foo {
        /* variable denoting active member of union */
        enum whichmember w;
        union {
                struct some_struct my_struct;
                struct some_struct2 my_struct2;
                struct some_struct3 my_struct3;
                /* let's say that my_struct is the largest member */
        };
};

main()
{
        /*...*/
        /* earlier in main, we get some struct foo d with an */
        /* unknown union assignment; d.w is correct, however */
        struct foo f;
        f.my_struct = d.my_struct; /* mystruct isn't necessarily the */
                                /* active member, but is the biggest */
        f.w = d.w;
        /* code that determines which member is active through f.w */
        /* ... */
        /* we then access the *correct* member that we just found */
        /* say, f.my_struct3 */

        f.my_struct3.some_member_not_in_mystruct = /* something */;
}

Accessing C union members via pointers seems to say that accessing the members via pointers is okay. See comments.

But my question concerns directly accessing them. Basically, if I write all the information that I need to the largest member of the union and keep track of types manually, will accessing the manually specified member still yield the correct information every time?

Community
  • 1
  • 1
occamsrazor
  • 105
  • 8
  • You are misreading the answer to the linked question. The main point is `It is unspecified (subtly different from undefined) behaviour to access a union by any element other than the one that was last written. That's detailed in C99 annex J`. Whether you `access` it via pointer vs. directly is irrelevant. – dxiv Dec 10 '15 at 02:08
  • yup. totally misread it. Not my question at all. – occamsrazor Dec 10 '15 at 02:18
  • @dxiv it's only unspecified insofaras bytes that were not part of the last object written take on unspecified values – M.M Dec 10 '15 at 02:37
  • @M.M You're right, and the 6.2.6.1.7 quoted in the latest answer makes it more clear than the annex J referenced in the other post. – dxiv Dec 10 '15 at 02:59
  • @occamsrazor Sorry, I misread your question indeed. It might perhaps work better to reverse the last line and write it as `/* something */ = f.my_struct3.some_member_not_in_mystruct;` since the way it is now it's not really matching the title `assign a value to one union member, read from another`. – dxiv Dec 10 '15 at 03:04

3 Answers3

8

I note that the code in the question uses an anonymous union, which means that it must be written for C11; anonymous unions were not a part of C90 or C99.

ISO/IEC 9899:2011, the current C11 standard, has this to say:

§6.5.2.3 Structure and union members

¶3 A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.

¶4 A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points, and is an lvalue.96) If the first expression is a pointer to a qualified type, the result has the so-qualified version of the type of the designated member.

¶5 …

¶6 One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.


95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

96) If &E is a valid pointer expression (where & is the ‘‘address-of’’ operator, which generates a pointer to its operand), the expression (&E)->MOS is the same as E.MOS.

Italics as in the standard

And section §6.2.6 Representations of types says (in part):

§6.2.6.1 General

¶6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

¶7 When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.


51) Thus, for example, structure assignment need not copy any padding bits.


My interpretation of what you're doing is that footnote 51 says "it might not work" because you may have assigned only part of the structure. You're treading on thin ice, at best. However, against that, you stipulate that the assigned structure (in the f.my_struct = d.my_struct; assignment) is the largest member. The chances are moderately high that it won't go wrong, but if the padding bytes in the two structures (in the active member of the union and in the largest member of the union) are at different places, then things could go wrong and if you reported a problem to the compiler writer, the compiler writer would simply say to you "don't contravene the standard".

So, to the extent I'm a language lawyer, this language lawyer's answer is "It is not guaranteed". In practice, you're unlikely to run into problems, but the possibility is there and you have no comeback on anyone.

To make your code safe, simply use f = d; with a union assignment.


Illustrative Example

Suppose that the machine requires double aligned on an 8-byte boundary and sizeof(double) == 8, that int must be aligned on a 4-byte boundary and sizeof(int) == 4, and that short must be aligned on a 2-byte boundary and sizeof(short) == 2). This is a plausible and even common set of sizes and alignment requirements.

Further, suppose that you have a two-structure union variant of the structure in the question:

struct Type_A { char x; double y; };
struct Type_B { int a; short b; short c; };
enum whichmember { TYPE_A, TYPE_B };

struct foo
{
    enum whichmember w;
    union
    {
        struct Type_A s1;
        struct Type_B s2;
    };
};

Now, under the sizes and alignments specified, the struct Type_A will occupy 16 bytes, and struct Type_B will occupy 8 bytes, so the union will use 16 bytes too. The layout of the union will be like this:

+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| x | p...a...d...d...i...n...g |               y               |  s1
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
|       a       |   b   |   c   |   p...a...d...d...i...n...g   |  s2
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

The w element would also mean that there are 8 bytes in struct foo before the (anonymous) union, of which it is likely that w only occupies 4. The size of struct foo is therefore 24 on this machine. That's not particularly relevant to the discussion, though.

Now suppose we have code like this:

struct foo d;
d.w = TYPE_B;
d.s2.a = 1234;
d.s2.b = 56;
d.s2.c = 78;

struct foo f;
f.s1 = d.s1;
f.w  = TYPE_B;

Now, under the ruling of footnote 51, the structure assignment f.s1 = d.s1; does not have to copy the padding bits. I know of no compiler that behaves like this, but the standard says that a compiler need not copy the padding bits. That means that the value of f.s1 could be:

+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| x | g...a...r...b...a...g...e |   r...u...b...b...i...s...h   |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

The garbage is because those 7 bytes need not have been copied (footnote 51 says that is an option, even though it is not likely to be an option exercised by any current compiler). The rubbish is because the initialization of d never set any values in those bytes; the contents of that part of the structure is unspecified.

If you now go ahead and try to treat f as a copy of d, you might be a little surprised to find that only 1 byte of the 8 relevant bytes of f.s2 is actually initialized.

I'll reemphasize: I know of no compiler that would do this. But the question is tagged 'language lawyer' so the issue is 'what does the language standard state' and this is my interpretation of the quoted sections of the standard.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • So help me out with memory layouts for a second, so that I may better understand your post. – occamsrazor Dec 10 '15 at 02:21
  • So you're saying that if, for example, `my_struct` is just a struct of three `int`s, `my_struct2` is 2 `int`s, and `my_struct3` is one `int`, they're not necessarily laid out like this: `[int][int][int]`, `[int][int][0]`, and `[int][0][0]` but could instead be like this (for `my_struct2` and `my_struct3`): `[0][int][int]` and `[0][0][int]`? – occamsrazor Dec 10 '15 at 02:27
  • Footnote 51 is saying that it *does* work to copy the union as a whole by assignment. This is well-defined. – M.M Dec 10 '15 at 02:39
  • 1
    @occamsrazor: see the update. In the simple cases of uniform types within the structure, you're on less dangerous grounds, but with non-uniform sizes and padding, you could (but probably won't) get unexpected results. – Jonathan Leffler Dec 10 '15 at 04:07
  • That was beautiful. Thank you for your help. Although your solution (stated in the first half of your post) is good, it would require me to refactor much of the code that I wrote for this project (and sacrifice clarity). Instead, I propose an alternative: what if I created a dummy union member (in addition to `s1` and `s2`) that forced alignment (i.e., exactly 16 bytes long), and used it for assignment only (as in `f.dummy = d.dummy`)? – occamsrazor Dec 10 '15 at 04:16
  • If you used `struct dummy { unsigned char space[MAX(sizeof(struct Type_A), sizeof(struct Type_B))];`, and had `struct dummy dummy;` in the union, and then used `f.dummy = d.dummy;`, then you should be OK. That requires all bytes of the the dummy structure to be copied — it is unlikely that a compiler will screw you over it. – Jonathan Leffler Dec 10 '15 at 04:21
1

Yes your code will work because with an union the compiler will share the same memory space for all the elements.

For example if: &f.mystruct = 100 then &f.mystruct2 = 100 and &f.mystruct3 = 100

If mystruct is the largest one then it will work all the time.

0

Yes you can directly access them. You can assign a value to a union member and read it back through a different union member. The result will be deterministic and correct.

nicomp
  • 4,344
  • 4
  • 27
  • 60
  • Is it non-standard C? – occamsrazor Dec 10 '15 at 01:43
  • @occamsrazor I don't understand the question. – nicomp Dec 10 '15 at 01:44
  • Is the assigning of a value to a union member and the reading from a different union member defined in the C standard? I.e., will the compiler complain? – occamsrazor Dec 10 '15 at 01:45
  • @occamsrazor The purpose of a union, well, one purpose, is to access the same memory in different ways. I don't know all the compilers but there are probably some that will whine a little. – nicomp Dec 10 '15 at 01:47
  • 1
    In some circumstances the result isn't "deterministic", e.g. writing a `char` and reading back an `int`. – M.M Dec 10 '15 at 02:01
  • @M.M You mean, because you are writing only one byte and reading (typically) four, right? I mean, three bytes worth of "garbage" remain? – Nicolas Miari Dec 10 '15 at 02:07
  • 2
    Please cite the sections of the standard you use to justify this answer. And explain why you think they say what you said they say. – Jonathan Leffler Dec 10 '15 at 02:09