Is the alignment requirement for incomplete `struct X` and `struct Y` the same?

Question

An answer to "C: When is casting between pointer types not undefined behavior?" indicates that casting forth and back via a pointer with no stricter alignment requirement is safe.

The alignment is important here even if the types are incomplete as the standard states:

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

Now the question is what correct alignment requirement of struct X is. Of course it would depend on it's contents if it is complete, but what if it's an incomplete (that it is only declared as struct X;)? Or put another way, can we cast from struct X* to struct Y* and back and get the same pointer again?

If the alignment requirement of a complete struct X and incomplete differs would that be a complication? I can't come up with an actual conversion between them, but it could be that you pass a struct X* between translation between context where the struct X is complete respectively incomplete. Or is this a special case that's always allowed anyway?

First, you have to explain how an incomplete structure can even have an alignment requirement - an incomplete structure can't be created. It can only be created by code that has access to the complete definition of the structure, and then an address of a complete structure can be passed to other code where the definition is incomplete. — Andrew Henle, Apr 19 '17 at 09:47
@AndrewHenle Oddly enough the standard requires the resulting pointer to be correctly aligned for the pointed-to type even if that type is incomplete. So there seem to be some kind of alignment requirement even for incomplete type. — skyking, Apr 19 '17 at 10:06
"Oddly enough the standard requires the resulting ***pointer*** to be correctly aligned ..." Alignment restrictions on the pointer are not alignment restrictions on any structure it might point to. — Andrew Henle, Apr 19 '17 at 10:13
@AndrewHenle I interpret it as it's the value of the pointer that must be correctly aligned not the storage. Requiring that the storage of the pointer to be aligned with what it points to doesn't make much sense. — skyking, Apr 19 '17 at 10:40
This is not a problem. There is very little you can do with an incomplete type, just *declare* a pointer to it. The *assignment* to the pointer is where the pedal hits the metal and alignment starts to play a role, a C compiler won't let you do this without it knowing the complete type. — Hans Passant, Apr 19 '17 at 10:57
@HansPassant Check the answer to the other question and the standard says that the alignment is important nevertheless. It states that the very conversion has undefined behavior should the resulting pointer be incorrectly aligned for the pointed-to type. That there's little I can do with the incomplete type doesn't help much when UB already has occured... — skyking, Apr 19 '17 at 11:06
@HansPassant: Neither conversion of a `void*` to a pointer-to-structure type, nor storage of such a pointer in an object of pointer-to-structure type, requires that a complete structure definition be visible, nor even exist anywhere. A system where `int*` is smaller than `char*` or `void*` may opt to either have structure pointers be large enough to accommodate arbitrary alignment, or may require that all structures be word-aligned even if all members are of type `char`, but all structure pointer types must have the same representation. All union pointers must likewise have the... — supercat, Apr 19 '17 at 21:13
...same representation as each other, though it is not required to be compatible with the representation for structure types. Commonplace systems, of course, use the same representation for both, but that is not required. — supercat, Apr 19 '17 at 21:17

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

To start with, I don't think you can talk about the alignment requirement of an incomplete type, because alignment requirements are only defined for complete types:

Complete object types have alignment requirements which place restrictions on the addresses at which objects of that type may be allocated. (§6.2.8/1; all standard citations are taken from n1570.pdf, which is effectively C11)

But in a valid program, a pointer must point to an object of complete type (or NULL). So even if the referenced type of a pointer is incomplete in some translation unit, the object referred to by that pointer (if there is one) must have a complete type and thus an alignment requirement. However, in the translation unit in which the referred type is incomplete, there is no way to know what that alignment requirement is, neither for the compiler nor for the person writing the code.

So the only way a pointer to an incomplete type can be legitimately assigned a non-NULL value is by means of a copy of a pointer to the same type from somewhere where the type is complete (perhaps in another translation unit). This is not a conversion, since the two types are compatible, and it is guaranteed to work. Similarly, one of the few legitimate uses for a value of a pointer to an incomplete type is to pass it to a pointer to the same type in a context where the type is complete. (Other uses include comparing it for equality with another pointer value or converting it to an integer and printing it out, but these don't involve converting to a different pointer type.)

So, in summary, pointers to incomplete types are useful and usable precisely in the expected use case -- where the referenced type is opaque -- but cannot be portably converted to pointers of any other type except for possibly qualified char* and void*.

The standard does not guarantee the possibility of converting a pointer to a pointer to a type whose alignment might be more stringent. If the alignment of the referred type is unknown, the only pointers whose target alignment cannot be more stringent are char* and void*. So any conversion of a pointer to an incomplete type to a type other than char* or void* must be regarded as unportable.

Really, the fact that the referred type is incomplete is not hugely relevant. The standard does not specify the alignment of a composite type. Such an alignment must be sufficient to allow the compiler to correctly align members, but it could be arbitrarily large. In other words, there is no guarantee that the types:

typedef char oneChar; 
struct oneChar_s { char x; };
union  oneChar_u { char x; };

have the same alignment. It is quite possible for the two composite types to have larger alignments than 1. So there is no guarantee that it is possible to convert a oneChar* to a oneChar_s* (unless, of course, the oneChar* were the result of a previous conversion in the opposite direction), and a portable program would not try. In this sense, it makes no difference whether the definition of struct oneChar_s is visible or not.

Not coincidentally, the standard does not guarantee that all object pointers have the same size. The underlying theory is that on some architectures, ordinary pointers are not sufficiently precise to refer to single bytes, but there is the possibility of augmenting the pointer with the addition of, for example, a bit offset. Indeed, it might be the case that there are other small objects which can be packed into words, which also require augmented pointer representations but with less precision than a bit offset.

In such an architecture, it is not possible to take advantage of different pointer precisions for small composite objects, because the standard insists that there be at most two representation of pointers to composites, one for structs and one for unions:

All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. (§6.2.5/27) [Note 2]

In particular, this means that pointers to objects of a given type have the same representation, regardless of whether the type is complete.

Difference in pointer representations is not the only reason that a conversion to a more constrained alignment might fail. For example, an implementation might insert code to verify alignment after a conversion (perhaps in response to a sanitizing compiler option). In the case of an incomplete type, the compiler would not be able to insert static code to do the check (although some kind of runtime check might be possible), but the fact that the compiler might omit the validation code would not alter the undefinedness of the result.

For what its worth, the standard citation in the OP was for C99; in C11, it has been modified slightly (emphasis added to indicate changed wording):

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object. (§6.3.2.3/7)

In my opinion, this change is purely editorial. It derives from the decision to change the definition of "object type" in §6.2.5/1. In C99, there were three kinds of type: object types, function types, and incomplete types. In C11, there are only two kinds -- object types and function types -- with the comment that "At various points within a translation unit an object type may be incomplete… or complete…", which is a more accurate description.

Notes

As a completely hypothetical example, consider a machine conceptually similar to the PDP-6/10 architecture. This is a word-addressed machine with a large word size; a word is large enough to contain two addresses (a fact which a hypothetical LISP implementation could take advantage of in order to store a cons node consisting of car and cdr fields into a single word). Because it is desired to represent vectors of small objects efficiently, the machine also has instructions which can extract or overwrite a bitfield within a word, where the bitfield pointer consists of a word pointer accompanied by an offset and length information. (Hence, a word can contain only one bitfield pointer.) (The hardward has an instruction which can increment bitfield pointers by adding the length to the offset and moving to bitfield starting at 0 of the next word if necessary.)

So there could be three different pointer types:
- a fullword character pointer consisting of an address and a bitfield offset/length.
- an ordinary halfword pointer to any word-aligned object type.
- an augmented halfword-plus-one-bit pointer to a pointer to a word aligned object, consisting of an address and an indication of whether the address is in the first or second half of the word. (This representation also probably requires a fullword, but the encoding is simpler. But there might also be some other place for the extra bit. It's a hypothetical example, remember.)
In this hypothetical architecture, the conversion rules become quite complicated. For example, you can convert an char** to an int* because the alignment of int is the same as the alignment of char*. But you cannot convert an int** to an int* because the alignment of int is greater than the alignment of int*.

Rather than memorizing these complex rules, a programmer would probably choose to simply forbear from performing pointer conversions other than those guaranteed to be portable, which is round-tripping through char* or void*.
It would be possible for pointers to all composites to use the larger, more precise pointer type, even if unnecessarily. It seems to me much more likely that an implementation would simply choose to impose a minimum alignment on structs, if not all composite objects. The wording of the standard would allow an implementation to use a minimum alignment for structs and an augmented pointer representation for all unions.

What does the bold-face word "referenced" mean in the above quote? Does it mean "resulting"? Is this a place (one of many) where the Standard uses a term without really defining it? — supercat, Apr 19 '17 at 21:22
@supercat: 6.2.5 para 20: "A pointer type may be derived from a function type or an object type, called the referenced type...." — rici, Apr 19 '17 at 22:01
@supercat: I put it in bold because it is a change in wording from C99, in which the word was "pointed-to". "Referenced", which is precisely defined, is much clearer, IMHO. — rici, Apr 19 '17 at 22:04
Fair enough; I tend to use the term "target type", since the past-participle "referenced" is often used to refer to something referenced *in text*. Does the Standard define specify that a pointer's "referenced" type is the type of object to which the pointer points? — supercat, Apr 19 '17 at 22:26
@supercat: I just quoted the definition in my comment! It's the type from which the pointer type is derived. IOW, if you have a pointer type X*, its referenced type is X. "Pointed-to" is unsatisfactory, since the type doesn't point at anything, and the value might not point at an object (since it might be NULL). — rici, Apr 19 '17 at 22:28
Also, while it would certainly be helpful to be able to say that the alignment requirement for a pointer to a union will be the coarsest alignment among *the members that are accessed via that pointer*, nothing in the Standard specifies that nor provides any means by which code could refuse to compile on an implementation that couldn't guarantee such behavior. If such behavior were guaranteed, then it might be sensible to say that code which wants to use a `uint16_t*` to access what *might* be part of a `uint32_t` could avoid aliasing issues by converting the pointer... — supercat, Apr 19 '17 at 22:34
... to a union containing both types and then using the union to perform the access. Code shouldn't need to know or care about whether the `uint16_t*` was 32-bit aligned if only the `uint16_t` member was used for access. I don't know to what extent compilers with multiple pointer representations have tried to support such a design, but today I wouldn't count on gcc to support such a thing even on platforms where all pointers use the same representation. — supercat, Apr 19 '17 at 22:38
@supercat The "target type" is a bad choice since the target type is the pointer and it's the alignment requirement of the "pointed-to" type that is the culprit. — skyking, Apr 20 '17 at 06:52
@skyking: In this context, it doesn't matter whether the alignment is correct for something to *be* a pointer to an object of type targeted *by* the destination pointer, or whether it's correct to be a pointer stored *in* the destination pointer--the two meanings would be equivalent. There are, however, two types "referenced" *by the text*. — supercat, Apr 20 '17 at 14:39
I just noticed the paragraph about "So the only way a pointer to an incomplete type can be legitimately assigned a non-null value..." but I don't see what would be "illegitimate" about a program which knows the size of a couple of incomplete types (perhaps stored in an imported symbols) from doing something like `struct tag1 *p1 = malloc(structSize1 + structSize2); struct tag *p2 = (struct tag p2*)(p1+1);` if the size of the first structure is known to be compatible with the alignment of the second. — supercat, Aug 16 '18 at 16:53

Is the alignment requirement for incomplete `struct X` and `struct Y` the same?

1 Answers1

Notes