1

Consider these structs on common 64bit system:

struct V1 {         // size 1, alignment 1
    uint8_t size;   // offset 0, size 1, alignment 1
    uint8_t data[]; // offset 1, size 0, alignment 1
};

struct V2 {        // size 12, alignment 4
    char c;       // offset  0, size 1, alignment 1
    int length;   // offset  4, size 4, alignment 4
    char b;       // offset  8, size 1, alignment 1
    short blob[]; // offset 10, size 0, alignment 2
};

In the first case the data member is right at the end of the struct taking up no space. This causes the following odd-ness:

struct V1 blobs[2];
&blobs[0].data == &blobs[1].size

Luckily the C standard §6.7.2.1, paragraph 3 says:

A structure or union shall not contain a member with incomplete or function type,… except that the last member of a structure with more than one named member may have incomplete array type; such a structure (and any union containing, possibly recursively, a member that is such a structure) shall not be a member of a structure or an element of an array.

So the above array is illegal and there is no problem with the addresses being the same.

What if I have code that, given a size, creates such structures in a contiguous block of memory that was pre-allocated? Would it be illegal for it to create instances with size == 0 because that would basically be an array of the struct?

Secondly I have a problem with V2. The compiler adds extra padding at the end of V2 so the size is a multiple of the alignment. This is necessary for structs in an array so the following structs remain properly aligned. But V2 must never be placed in an array so I fail to see why there should be any padding at the end of V2.

In fact I would go so far as to say it is wrong to add padding there. It obfuscates calculating the size of the struct for a given length of blob because now the offset of blob has to be considered instead of the size of the struct.

align = _Alignof(struct V2);
needed_size = offsetof(struct V2, blob) + length;   // beware of overflow
needed_size = (needed_size + align - 1) & (~align); // beware of overflow

Is there something I'm missing why struct V2 must be padded?

Goswin von Brederlow
  • 11,875
  • 2
  • 24
  • 42
  • 1
    `creates such structures` how do you "create structures"? (note: be aware of https://stackoverflow.com/questions/38515179/is-it-possible-to-write-a-conformant-implementation-of-malloc-in-c) `The compiler adds` which compiler? `Secondly I have a problem with V2.` This looks like a nice question. You should consider asking a separate question. – KamilCuk Mar 11 '22 at 18:42
  • I expect 6.7.2.1 3 means there should not be a type that is an array whose elements are structures with flexible array members but does not mean you cannot place such structures consecutively in memory (by writing the bytes that represent them to memory). It would be up to you to calculate the start address of them using pointer arithmetic, not using a pointer to the first one (`p`) as if it were an array that could access later ones (`p[i]`). If your compiler is allowing this, it should not when conforming to the C standard. – Eric Postpischil Mar 11 '22 at 18:56
  • 2
    Re “Would it be illegal for it to create instances with size == 0 because that would basically be an array of the struct?” No. Just because we put two things in memory does not mean we treat them as an array. This constraint of the standard is not trying to limit where you can place the structures; it is avoiding using them in pointer arithmetic. – Eric Postpischil Mar 11 '22 at 18:59
  • V2 is 4-byte aligned because of the `int length` member, not because of `blob`. – rici Mar 11 '22 at 19:55
  • @rici: OP knows that. The question is not about why the structure has an alignment requirement of 4 but why it has padding to make its size a multiple of that alignment requirement. Since it will never be in an array, there is no need for padding to make the next element of the array aligned. – Eric Postpischil Mar 11 '22 at 19:58
  • @eric: if you follow that logic, then adding an extra member at the end of a struct could make the struct *smaller*, since it would have alignment 4 and size 12 without the flexible member. It's hard to see how that could be allowed See clause 22 in Example 2 of 6.7.2.1 – rici Mar 11 '22 at 20:33
  • @KamilCuk Yes, implementing a malloc like function for such objects would be the idea. As for which compiler adds: gcc, clang, circle for x86, amd64, arm and arm64. – Goswin von Brederlow Mar 11 '22 at 23:30
  • @EricPostpischil I would agree that `p[i]`, `*(p+i)`, `*(++p)` are all illegal. I could even accept that `p++` is illegal as any pointer to such a struct can only point to a single object and doing pointer arithmetic makes no sense (is &v1 an array of size 1? can't be). – Goswin von Brederlow Mar 11 '22 at 23:33
  • @EricPostpischil The problem I see with allowing a function to create V1 with a size 0 is that they can appear in memory one after the other and then the pointer equal I showed with the array would be true. – Goswin von Brederlow Mar 11 '22 at 23:35
  • Consider a malloc implementation that stores all objects of size 1 in a bucket with a bitmap to manage them. Allocate 2 `struct V1` and you get the same effect as the arrays with 2 pointers aliasing. – Goswin von Brederlow Mar 12 '22 at 00:11
  • @GoswinvonBrederlow: It is not a problem for those two pointers to be equal. I do not see why you think it would be a problem. If you define `int a, b;` in `main`, it may be true that `&a + 1 == &b` or vice-versa. So what? The fact that two things are adjacent in memory does not mean they are an array or violate any rule against forming an array. **It is not a violation of the C standard for two instances of a structure with a flexible array member to be adjacent.** – Eric Postpischil Mar 12 '22 at 00:20
  • They are not adjacent, they are identical. It's as if you have ''int a, b; &a == &b;" It's probably a violation to pass the two `char*` to anything that would compare them but it's still surprising. – Goswin von Brederlow Mar 12 '22 at 00:34
  • I actually just thought of another such case: `struct { char c[4] } x[2]; &(x[0].c[4]) == &(x[1].c[1])` so there really is nothing special going on with the variable arrays. You always have this overlap of pointers. Never mind me, – Goswin von Brederlow Mar 12 '22 at 00:39
  • @GoswinvonBrederlow: The two structures are adjacent. `&blobs[0].data` is the address of the `data` member, which is zero bytes; there are no actual objects in it. This address is also the address of the end of `blobs[0]`. `&blobs[1].size` is the address of the `size` member, which is also the address of the start of `blobs[1]`. Regardless of how you interpret it, there is simply no prohibition in the C standard against `&blobs[0].data == &blobs[1].size`. – Eric Postpischil Mar 12 '22 at 00:40
  • @GoswinvonBrederlow: That should be `&x[0].c[4] == &x[1].c[0]` (`0` not `1` in the last subscript). – Eric Postpischil Mar 12 '22 at 00:41
  • @EricPostpischil You are right, typo. – Goswin von Brederlow Mar 12 '22 at 00:46
  • `s that they can appear in memory one after the other and then the pointer equal I showed with the array would be true` [pointer provenance Can equality testing on pointers be affected by pointer provenance information?](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2263.htm#q2.-can-equality-testing-on-pointers-be-affected-by-pointer-provenance-information) – KamilCuk Mar 12 '22 at 06:08

2 Answers2

3

What if I have code that, given a size, creates such structures in a contiguous block of memory that was pre-allocated? Would it be illegal for it to create instances with size == 0 because that would basically be an array of the struct?

As @EricPostpischil explained in comments, the constraint in question is not about the layout of objects in memory, but rather about the declared element type of an actual array. An object that is not declared as an array is not an array in the relevant sense, no matter how array-like it may seem, or how we think about it or use it. So no, the language spec does not forbid what you describe.

The compiler adds extra padding at the end of V2 so the size is a multiple of the alignment. This is necessary for structs in an array so the following structs remain properly aligned. But V2 must never be placed in an array so I fail to see why there should be any padding at the end of V2.

The C language specification permits implementations to pad structure layouts after any member, including the last, at their own discretion. Among the primary purposes is to allow structure members to be properly aligned, including, but not limited to, within arrays of structures, but use of padding in structure layouts is not contingent on there being an alignment-based justification.

In fact I would go so far as to say it is wrong to add padding there.

"Wrong" a strong word. Especially in the context of a language-lawyer question, you should back it up with an argument based on the language specification. I don't think you can do that.

It obfuscates calculating the size of the struct for a given length of blob because now the offset of blob has to be considered instead of the size of the struct.

Not exactly true. If you want to compute the minimum possible size into which an instance of your structure can fit then yes, you need to take the offset of the FAM into account. However,

  1. That's not a function of there being padding, but rather of the offset of the FAM differing from the size of the structure. That can't happen without padding, but it doesn't have to happen with padding.

  2. If you are so space-constrained that you cannot accommodate the possibility of a few bytes of overallocation for the sake of clearer code, then dynamic allocation and FAMs probably are not a good idea in the first place. In particular, the allocator itself typically does not allocate with single-byte granularity.

  3. Substituting an offsetof expression for a sizeof expression is hardly obfuscatory. It might even be clearer, since then the name of the FAM actually appears in the size computation. Your particular example code is somewhat overcomplicated, however, by the unnecessary measure employed to make the allocation size a multiple of the structure's alignment requirement.

Although the size of a structure type that has a FAM does not include the size of the FAM itself, it does include any padding between the penultimate member and the FAM, and possibly more:

In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply.

(C17 6.7.2.1/18)

Thus, a pretty tight upper bound on the space needed for a structure of type struct S that has a flexible array member fam of type fam_t can be calculated as:

size_t bytes_needed = sizeof(struct S) + num_fam_elements * sizeof(fam_t);

That is in fact idiomatic, but if you prefer

size_t bytes_needed = offsetof(struct S, fam) + num_fam_elements * sizeof(fam_t);
if (bytes_needed < sizeof(struct S)) {
    bytes_needed = sizeof(struct S);
}

for the absolute minimum then I see nothing objectionable about that form.

Is there something I'm missing why struct V2 must be padded?

Undoubtedly so, as you observe your implementation to pad it, but the implementation does not owe you an explanation.

Nevertheless, your implementation most likely applies a combination of rules such as these:

  • the alignment requirement for a structure type is the same as the strictest alignment requirement of any of its members, and
  • the size of a structure type is an an integer multiple of its alignment requirement.

Neither of those is a rule of the language itself, but they are fairly common in practice. In particular, they are part of the System V x86_64 ABI, and undoubtedly of other ABIs, too. Note that although those rules do serve the purpose of ensuring that structure members can be properly aligned inside an array of structures, they make no exception for structure types that are not allowed to be the element type of an array.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • `num_fam_elements`, `sizeof(struct S) + num_fam_elements * sizeof(fam_t)` calculates two more bytes than are needed. All the bytes representing members of the structure, including the flexible array member, will fit in `num_fam_elements`, `sizeof(struct S)-2 + num_fam_elements * sizeof(fam_t)` bytes. Further the padding included in the size by the compiler do not help with alignment here—with or without the `−2`, there will be some values of `num_fam_elements` for which that calculated size is a multiple of 4 and some for which it is not. So the 2 bytes of padding is not helping anything. – Eric Postpischil Mar 11 '22 at 20:02
  • I'm not following you there, @EricPostpischil. But to be clear, one needs `sizeof(struct S)` bytes for everything in the structure except the FAM, but including any padding immmediately before the FAM, plus one needs storage for each element of the FAM, which is `num_fam_elements * sizeof(fam_t)`. The offset of the FAM in the structure is `sizeof(struct S)`, so this is neither more nor less than is needed. Whether the overall size computed that way is a multiple of 2 or 4 or any other number is not relevant. – John Bollinger Mar 11 '22 at 20:11
  • The offset of the FAM in the structure is not `sizeof(struct S)`. As OP’s results show (pasted into the comments), `sizeof(struct S)` is 12 bytes, but the offset of `blob` is 10 bytes. The extra two bytes are not needed and serve no purpose. As you note in the answer, these padding bytes exist because the compiler makes “no exception for structure types that are not allowed to be the element type of an array.” But that is just the cause of them, not a reason—the rules do not need to omit this case. – Eric Postpischil Mar 11 '22 at 20:54
  • Thank you, @EricPostpischil. I have updated this answer in light of your comments. – John Bollinger Mar 11 '22 at 21:22
  • I guess I'm asking for the rational for C17 6.7.2.1/18 now. One possibility I could imagine is that if you put V2 with and without `blob` in a union you can cast from V2 to V2_no_blob and pass a pointer to that to other functions. They would then assume the padding exists and might read/write to the padding due to optimization. And the standard want's to keep that optimization valid. – Goswin von Brederlow Mar 11 '22 at 23:49
  • @GoswinvonBrederlow, the official rationale for the feature is presented in [the C99 rationale document](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n897.pdf), but I'm afraid that you're probably going to find it unsatisfying. It comes down to fitting an official feature to an existing non-standard programming trick (the "struct hack") that was deemed sufficiently common and useful to warrant it. For additional context, do consider that the committee seems never to have considered it a priority to mandate minimization of the size of structures. – John Bollinger Mar 12 '22 at 15:03
2

This answer addresses “Is there something I'm missing why struct V2 must be padded?”

If a compiler did not pad a structure type to be a multiple of its alignment requirement, then some structure types would violate this rule in C 2018 6.7.2.1 18:

… In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply…

To see this, consider this structure in an implementation where int is four bytes and has a four-byte alignment requirement:

struct s0
{
    int  i;
    char c;
};

This structure requires five bytes for its members, so it must be padded to eight bytes to satisfy the alignment requirements when used in an array. Next, we add flexible array member:

struct s1
{
    int  i;
    char c;
    char a[];
};

This structure also requires five bytes for its inflexible members. None are required for the flexible array. If the compiler did not pad it to eight bytes, it would be shorter than struct s0, which violates the rule that its size must be either as if the flexible array member were omitted or that size plus more padding.

This tells us why a conforming compiler is constrained to include the padding. However, it does not tell us the reason for the rule. I see none except that it would be more complicated to write rules into the C standard to allow less padding.

Some Discussion About Object Size

Review of the C 2018 standard reveals nothing which explicitly says the size of an object must be a multiple of its alignment requirement. Obviously, the ability to put objects into an array depends on this, but the lack of a requirement that the size be a multiple of an alignment requirement would mean there might be some objects (besides a structure with flexible array member) that could not be used in arrays; the inability to put objects into an array would not cause the requirement to come into existence.

Thus, it might be conforming for a C implementation to define struct s0 to be five bytes with an alignment requirement of four bytes, and then it could make struct s1 also five bytes with an alignment requirement of four bytes.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • There is nothing in the definition of `struct s0` which precludes it being an array element and `sizeof` must return the difference between the addresses of two consecutive elements of an array of that type. (Even if there is no such array in this TU, there might be one in another TU in the same program.) So I don't see how a conforming compiler could allow `sizeof(struct s0)` to be 5. – rici Mar 11 '22 at 21:37
  • @rici: Re “There is nothing in the definition of `struct s0` which precludes it being an array element”. So what? There is nothing in the C standard that says it must be possible to make an array from any object. It may not be precluded by the definition, but neither is it mandated. A C implementation could choose to make it 5 bytes with an alignment of 4. Your assertion that “`sizeof` must return the difference between the addresses of two consecutive elements of an array of that type” presumes it must be possible to form an array from that type. The C standard does not assert that. – Eric Postpischil Mar 11 '22 at 21:52
  • @EricPostpischil It would have to require some special `__attribute__((no_tail_padding))` because otherwise it would have to do it to every single struct. Nothing would be usable in an array. – Goswin von Brederlow Mar 11 '22 at 23:43
  • That might come as a surprise to the many C programmers who casually assume that when we write `malloc(n * sizeof(struct Object))` the result will be sufficient to hold `n` suitably aligned `struct Object`s. – rici Mar 12 '22 at 00:01
  • @rici: Yes, it may come as a surprise. However, this question is tagged language-lawyer. The question is not what is unsurprising behavior or what does almost everybody assume. The question is what does the C standard **actually say**. So show us some text in the C standard that says it is necessarily possible that each complete object type other than a structure with a flexible array member can be an element type for an array or some text that says the size of an object must be a multiple of its alignment requirement. – Eric Postpischil Mar 12 '22 at 00:23
  • This question was (explicitly) considered by the Standards committee in their response to Defect Report #074 (1993), which asked (among other things) "can the expression `sizeof (t) % A(t)` be non-zero?" (The DR uses A(t) to mean what would now be written `alignof(A)`). The Committee's response was "sizeof (t) must indeed be a multiple of A(t)." (http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_074.html). One of the other things they note in the response is that "In several places the C Standard states that a single object may be treated as an array of one element...." I'll leave it at that. – rici Mar 12 '22 at 02:21
  • @rici: Okay, the C committee wanted object sizes to be multiples of their alignment requirements, like everybody else assumes. Nonetheless, no text to that effect appears in the C standard. – Eric Postpischil Mar 12 '22 at 10:56