C: Is accessing initial member of nested struct using pointer cast to "outer" struct type defined?

Question

I'm trying to understand the so-called "common initial sequence" rule for C aliasing analysis. This question does not concern C++.

Specifically, according to resources (for example the CPython PEP 3123),

[A] value of a struct type may also be accessed through a pointer to the first field. E.g. if a struct starts with an int, the struct * may also be cast to an int *, allowing to write int values into the first field.

(emphasis mine).

My question can be roughly phrased as "does the ability to access a struct by pointer to first-member-type pierce nested structs?" That is, what happens if access is via a pointer whose pointed-to type (let's say type struct A) isn't exactly the same type as that of the first member (let's say type struct B), but that pointed-to type (struct A) has common first initial sequence with struct B, and the "underlying" access is only done to that common initial sequence?

(I'm chiefly interested in structs, but I can imagine this question may also pertain to unions, although I imagine unions come with their own tricky bits w.r.t. aliasing.)

This phrasing may not clear, so I tried to illustrate my intention with the code as follows (also available at godbolt.org, and the code seem to compile just fine with the intended effect):

/* Base object as first member of extension types. */
struct base {
    unsigned int flags;
};

/* Types extending the "base" by including it as first member */
struct file_object {
    struct base attr;
    int index;
    unsigned int size;
};

struct socket_object {
    struct base attr;
    int id;
    int type;
    int status;
};

/* Another base-type with an additional member, but the first member is
 * compatible with that of "struct base" */
struct extended_base {
    unsigned int flags;
    unsigned int mode;
};

/* A type that derives from extended_base */
struct extended_socket_object {
    struct extended_base e_attr;  /* Using "extended" base here */
    int e_id;
    int e_type;
    int e_status;
    int some_other_field;
};

/* Function intended for structs "deriving from struct base" */
unsigned int set_flag(struct base *objattr, unsigned int flag)
{
    objattr->flags |= flag;
    return objattr->flags;
}

extern struct file_object *file;
extern struct socket_object *sock;
extern struct extended_socket_object *esock;

void access_files(void)
{
    /* Cast to pointer-to-first-member-type and use it */
    set_flag((struct base *)file, 1);

    set_flag((struct base *)sock, 1);

    /* Question: is the following access defined?
     * Notice that it's cast to (struct base *), rather than 
     * (struct extended_base *), although the two structs share the same common
     * initial member and it is this member that's actually accessed. */
    set_flag((struct base *)esock, 1);
    return;
}

Maybe related: https://stackoverflow.com/questions/66427774/best-practices-for-object-oriented-patterns-with-strict-aliasing-and-strict-alig but I don't think these are the same question; I'm more concerned with whether two "base" structs can be access in a more-or-less interchangeable manner. — Zoë the Scribe, Mar 25 '21 at 19:32
`base` and `extended_base` are two different structs. So the CPython quote isn't quite relevant. You are essentially asking if different structs that share the same initial members can be accessed through one another. — P.P, Mar 25 '21 at 19:39
Does this answer your question? [Are C-structs with the same members types guaranteed to have the same layout in memory?](https://stackoverflow.com/q/19804655/1275169) — P.P, Mar 25 '21 at 19:42

dbush · Answer 1 · 2021-03-25T21:42:55.303

This is not safe as you're attempting to access an object of type struct extended_base as though it were an object of type struct base.

However, there are rules that allow access to two structures initial common sequence via a union. From section 6.5.2.3p6 of the C standard:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members

So if you change the definition of struct extended_socket_object to this:

struct extended_socket_object {
    union u_base {
        struct base b_attr;
        struct extended_base e_attr;
    };
    int e_id;
    int e_type;
    int e_status;
    int some_other_field;
};

Then a struct extended_socket_object * may be converted to union u_base * which may in turn be converted to a struct base *. This is allowed as per section 6.7.2.1 p15 and p16:

15 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

16 The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.

It is then allowed to access b_attr->flags because of the union it resides in via 6.5.2.3p6.

The cited paragraph 6.5.2.3p6 does not imply that a `struct extended_socket_object *` may be converted to `union u_base *`. — Armali, Mar 25 '21 at 21:28
@Armali Those are covered by 6.7.2.1 p15 and p16, though I didn't site those because OP seemed to already understand that. I've included them for completeness. — dbush, Mar 25 '21 at 21:43

Orielno · Answer 2 · 2021-03-26T07:44:51.907

According to the C Standard (6.7.2.1 Structure and union specifiers, paragraph 13):

A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa.

So, converting esock to struct extended_base * and then converting it to unsigned int * must give us a pointer to the flags field, according to the Standard.

I'm not sure if converting to to struct base * counts as "suitably converted" or not. My guess is that it would work at any machine you will try it on, but I wouldn't recommend it.

I think it would be safest (and also make the code more clear) if you simply keep a member of type struct base inside struct extended_base (instead of the member of type unsigned int). After doing that, you have two options:

When you want to send it to a function, write explicitly: esock->e_attr.base (instead of (struct base *)esock). This is what I would recommend.
You can also write: (struct base *) (struct extended_base *) esock which is guaranteed to work, but I think it is less clear, and also more dangerous (if in the future you will want to add or accidentaly add another member in the beginning of the struct).

score 2 · Answer 3 · answered Mar 26 '21 at 13:38

After reading up into the standard's text following the other answers (thanks!!) I think I may try to answer my own question (which was a bit misleading to begin with, see below)

As the other answers pointed out, there appear to be two somewhat overlapping concerns in this question -

"common initial sequence" -- in the standard documents this specifically refers to the context of a union having several structs as member and when these member structs share some compatible members beginning from the first. (§6.5.2.3 " Structure and union members", p6 -- Thanks, @dbush!).

My reading: the language spec suggests that, if at the site of access to these "apparently" different structs it is made clear that they actually belong to the same union, and that the access is done through the union, it is permitted; otherwise, it is not.

I think the requirement is meant to work with type-based aliasing rules: if these structs do indeed alias each other, this fact must be made clear at compile time (by involving the union). When the compiler sees pointers to different types of structs, it can't, in the most general case, deduce whether they may have belonged to some union somewhere. In that case, if it invokes type-based alias analysis, the code will be miscompiled. So the standard requires that the union is made visible.
"a pointer (to struct), when suitably converted, points to its initial member" (§6.7.2.1 "Structure and union specifiers", p15) -- this sounds tantalizingly close to 1., but it's less about aliasing than about a) the implementation requirements for struct and b) "suitable conversion" of pointers. (Thanks, @Orielno!)

My reading: the "suitable conversion" appears to mean "see everything else in the standard", that is, no matter if the "conversion" is performed by type cast or assignment (or a series of them), being "suitable" suggests "all constraints must be satisfied at all steps". The "initial-member" rule, I think, simply says that the actual location of the struct is exactly the same as the initial member: there cannot be padding in front of the first member (this is explicitly stated in the same paragraph).

But no matter how we make use of this fact to convert pointers, the code must still be subject to constraints governing conversion, because a pointer is not just a machine representation of some location -- its value still has to be correctly interpreted in the context of types. A counterexample would be a conversion involving an assignment that discards const from the pointed-to type: this violates a constraint and cannot be suitable.

The somewhat misleading thing in my original post was to suggest that rule 2 had something to do with "common initial sequence", where it is not directly related to that concept.

So for my own question, I tend to answer, to my own surprise, "yes, it is valid". The reason is that the pointer conversion by cast in expression (struct base *)esock is "legal in the letter of the law" -- the standard simply says that (§6.5.4 "Cast operators", p3)

Conversions that involve pointers, other than where permitted by the constraints of 6.5.16.1 (note: constraints governing simple assignment), shall be specified by means of an explicit cast.

Since the expression is indeed an explicit cast, in and by itself it doesn't contradict the standard. The "conversion" is "suitable". Further function call to set_flag() correctly dereferences the pointer by virtue of the suitable conversion.

But! Indeed the "common initial sequence" becomes important when we want to improve the code. For example, in @dbush's answer, if we want to "inherit from multiple bases" via union, we must make sure that access to base is done where it's apparent that the struct is a member of the union. Also, as @Orielno pointed out, when the code makes us worry about its validity, perhaps switching to an explicitly safe alternative is better even if the code is valid in the first place.

score 0 · Answer 4 · answered Mar 31 '21 at 17:17

In the language the C Standard was written to describe, an lvalue of the form ptr->memberName would use ptr's type to select a namespace in which to look up memberName, add the offset of that member to the address in ptr, and then access an object of that member type at that address. Once the address and type of the member were determined, the original structure object would play no further rule in the processing of the expression.

When C99 was being written, there was a desire to avoid requiring that a compiler given something like:

struct position {double x,y,z; };
struct velocity {double dx,dy,dz; };

void update_positions(struct positions *pp, struct velocity *vv, int count)
{
  for (int i=0; i<count; i++)
  {
    positions[i].x += vv->dx;
    positions[i].y += vv->dy;
    positions[i].z += vv->dz;
  }
}

must allow for the possibility that a write to e.g. positions[i].y might affect the object of vv->dy even when there is no evidence of any relationship between any object of type struct position and any object of type struct velocity. The Committee agreed that compilers shouldn't be required to accommodate interactions between different structure types in such cases.

I don't think anyone would have seriously disputed the notion that in situations where storage is accessed using a pointer which is freshly and visibly converted from one structure type to another, a quality compiler should accommodate the possibility that the operation might access a structure of the original type. The question of exactly when an implementation would accommodate such possibilities should depend upon what its customers were expecting to do, and was thus left as a quality-of-implementation issue outside the Standard's jurisdiction. The Standard wouldn't forbid implementations from being willfully blind to even the most obvious cases, but that's because the dumber something would be, the less need there should be to prohibit it.

Unfortunately, the authors of clang and gcc have misinterpreted the Standard's failure to forbid them from being obtusely blind to the possibility that a freshly-type-converted pointer might be used to access the same object as a pointer of the original type, as an invitation to behave in such fashion. When using clang or gcc to process any code which would need to make use of the Common Initial Sequence guarantees, one must use -fno-strict-aliasing. When using optimization without that flag, both clang nor gcc are prone to behave in ways inconsistent with any plausible interpretation of the Standard's intent. Whether one views such behaviors as being a result of a really weird interpretation of the Standard, or simply as bugs, I see no reason to expect that gcc or clang will ever behave meaningfully in such cases.

I'm a bit confused by this: "When using clang or gcc to process any code which would need to make use of the Common Initial Sequence guarantees, one must use `-fno-strict-aliasing`" Since the code that presumably uses the Common Initial Sequence already do so via a union visible to the block, why is the flag a "must"? I think the compiler in that case can simply deduce that the structs are aliased as members to the union. Do you have an example where the compiled code turns out different depending on compiler/flag? — Zoë the Scribe, Apr 02 '21 at 12:51
@ZoëtheScribe: In the language the C Standard was written to describe, and all versions of that language dating back at least to 1974, pointers to structures which shared a Common Initial Sequence could be used interchangeably to inspect members of that Common Initial Sequence, and the CIS guarantee was exploited far more often in the context of pointers than in the context of unions (which didn't even *exist* in 1974!) If the CIS were only specified in the context of structure pointers, such a specification wouldn't have been applicable to unions whose address was never taken. — supercat, Apr 02 '21 at 16:44
By contrast, if a function receives a `struct s1*` from code in an outside compilation unit, casts it to a `struct s2*`, and accesses a member thereof, it would be impossible for a compiler to uphold the CIS guarantee for unions without *also* upholding it for structures. The C89 Standard didn't expressly specify that the CIS guarantee applies to structure pointers as well as unions, even though that's by far the more common situation where it can be usefully exploited, because such specification would have been seen as redundant. On the other hand... — supercat, Apr 02 '21 at 16:56
...both clang nor gcc ignore the CIS guarantees even in situations where a complete union type definition is visible, and even code uses the pattern `processStruct1(&unionArray[i].thing1); processStruct2(&unionArray[j].thing2); processStruct1(&unionArray[i].thing1);`, and for that matter even in some cases where union members are written before their address is taken. Whether that should be viewed as a bug, or merely a really bizarre reading of the Standard, I see no reason to believe clang or gcc will ever be designed to reliably process such code meaningfully. — supercat, Apr 02 '21 at 16:59
@ZoëtheScribe: See https://gcc.godbolt.org/z/dMTMq6ndW for an example of a situation which both clang and gcc process unions nonsensically. Perhaps their behavior in this example is a result of their being simply broken, rather than being a result of a narrow interpretation of CIS rules, but I think what happened is that one phase of optimization simplified code in a manner that would have been legitimate if a later stage would uphold CIS guarantees, but then a later stage failed to uphold the CIS guarantee expected by the earlier stage. I find gcc's generated code especially bizarre... — supercat, Apr 02 '21 at 17:14
...since it doesn't optimize out the reload of `uarr[i].v1.x` (known as `uarr[0+rdi*4]` in the machine code) but performs it before the increment of `uarr[j].v2.x` (known as `uarr[0+rsi*4]` in the machine code), meaning that gcc exhibits the same wrong behavior as if it had omitted the instruction, but without reaping the performance benefit. — supercat, Apr 02 '21 at 17:17

C: Is accessing initial member of nested struct using pointer cast to "outer" struct type defined?

4 Answers4