12

Consider the following three structs:

class blub {
    int i;
    char c;

    blub(const blub&) {}
};

class blob {
    char s;

    blob(const blob&) {}
};

struct bla {
    blub b0;
    blob b1;
};

On typical platforms where int is 4 bytes, the sizes, alignments and total padding1 are as follows:

  struct   size   alignment   padding  
 -------- ------ ----------- --------- 
  blub        8           4         3  
  blob        1           1         0  
  bla        12           4         6  

There is no overlap between the storage of the blub and blob members, even though the size 1 blob could in principle "fit" in the padding of blub.

C++20 introduces the no_unique_address attribute, which allows adjacent empty members to share the same address. It also explicitly allows the scenario described above of using padding of one member to store another. From cppreference (emphasis mine):

Indicates that this data member need not have an address distinct from all other non-static data members of its class. This means that if the member has an empty type (e.g. stateless Allocator), the compiler may optimise it to occupy no space, just like if it were an empty base. If the member is not empty, any tail padding in it may be also reused to store other data members.

Indeed, if we use this attribute on blub b0, the size of bla drops to 8, so the blob is indeed stored in the blub as seen on godbolt.

Finally, we get to my question:

What text in the standards (C++11 through C++20) prevents this overlapping without no_unique_address, for objects that are not trivially copyable?

I need to exclude trivially copyable (TC) objects from the above, because for TC objects, it is allowed to std::memcpy from one object to another, including member subobjects, and if the storage was overlapped this would break (because all or part of the storage for the adjacent member would be overwritten)2.


1 We calculate padding simply as the difference between the structure size and the size of all its constituent members, recursively.

2 This is why I have copy constructors defined: to make blub and blob not trivially copyable.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • I haven't researched it, but I'm *guessing* the "as if" rule. If there is no observable difference (a term with very specific meaning btw) to the abstract machine (which is what your code is compiled against) then the compiler can change the code however it likes. – Jesper Juhl Jan 22 '20 at 20:25
  • Pretty sure this is a dupe of this: https://stackoverflow.com/questions/53837373/standard-layout-and-tail-padding – NathanOliver Jan 22 '20 at 20:27
  • @JesperJuhl - right, but I'm asking _why can't it_, not _why can it_, and the "as if" rule usually applies to the former but doesn't make sense for the latter. Also, "as if" isn't clear for structure layout which is usually a global concern, not a local one. Ultimately the compiler has to have a single consistent set of rules for layout, except perhaps for structures it can prove never "escape". – BeeOnRope Jan 22 '20 at 20:28
  • @NathanOliver - no, although it happened to answer exactly another question I had! That question is all about inheritance and padding re-use, which is allowed. This question is the opposite: there is no inheritance, only composition, and padding re-use is apparently not allowed. The question is _why_ (especially given that it is allowed in the inheritance case, which maybe I should note)? – BeeOnRope Jan 22 '20 at 20:30
  • 1
    @BeeOnRope I cannot answer your question, sorry. Which is why I just posted a comment and not an answer. What you got in that comment was my best guess towards an explanation, but I don't *know* the answer (currious to learn it myself - which is why you got an upvote). – Jesper Juhl Jan 22 '20 at 20:32
  • Ah yes, My eyes deceived me. Not a dupe. Glad it at least answered another question you had :) – NathanOliver Jan 22 '20 at 20:37
  • @NathanOliver - yes, and saved me from writing it up :). – BeeOnRope Jan 22 '20 at 20:44
  • @BeeOnRope: "*I need to exclude trivially copyable (TC) objects from the above*" You can't. That's a property of the *object*, not the *type*. Your attempt to write a function that can statically and perfectly detect when it is legal to perform trivial copies on any object it is given will only end in tears. It is not possible. – Nicol Bolas Jan 22 '20 at 20:47
  • 1
    @NicolBolas - are you replying to the right question? This is not about detecting safe copies or anything else. Rather I am curious why padding can't be re-used between members. In any case, you are wrong: _trivially copyable_ is a [property of the type](https://timsong-cpp.github.io/cppwp/n4659/class#6) and always has been. However, to safely copy an object it must _both_ have a TC type (a property of the type), and not be a potentially-overlapping-subject (a property of the object, which I guess is where you got confused). Still don't know why we are talking about copies here tho. – BeeOnRope Jan 22 '20 at 20:54
  • Said another way, I am excluding TC objects _from the question_, because if I don't, someone can point out that the byte-wise object representation/memcpy stuff would prevent overlapping members, and I agree. So I am curious why it doesn't prevent for non-TC objects. – BeeOnRope Jan 22 '20 at 20:57
  • So, are you asking what prevents the compiler from reusing the padding space between members that it knows are non-trivially copyable? – Acorn Jan 22 '20 at 21:01
  • Correct. For example, the same type of re-use shown in the `[[no_unique_address]]` example. `[[no_unique_address]]` explicitly allows this (in [a note](http://eel.is/c++draft/dcl.attr.nouniqueaddr#2)) but it isn't clear to me why this wasn't already possible (for non-TC types). – BeeOnRope Jan 22 '20 at 21:05

1 Answers1

1

The standard is awfully quiet when talking about the memory model and not very explicit about some of the terms it uses. But I think I found a working argumentation (that may be a bit weak)

First, let's find out what is even part of an object. [basic.types]/4:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object of type T is the set of bits that participate in representing a value of type T. Bits in the object representation that are not part of the value representation are padding bits.

So the object representation of b0 consists of sizeof(blub) unsigned char objects, so 8 bytes. The padding bits are part of the object.

No object can occupy the space of another if it is not a nested within it [basic.life]/1.5:

The lifetime of an object o of type T ends when:

[...]

(1.5) the storage which the object occupies is released, or is reused by an object that is not nested within o ([intro.object]).

So the lifetime of b0 would end, when the storage that is occupied by it would be reused by another object, i.e. b1. I haven't checked that but I think the standard mandates that the subobject of an object that is alive should also be alive (and I couldn't imagine how this should work differently).

So the storage that b0 occupies may not be used by b1. I have found no definition of "occupy" in the standard, but I think a reasonable interpretation would be "part of the object representation". In the quote descriping object representation, the words "take up" are used1. Here, this would be 8 bytes, so bla needs at least one more for b1.

Especially for subobjects (so among others non-static data members) there is also the stipulation [intro.object]/9 (but this was added with C++20, thx @BeeOnRope)

Two objects with overlapping lifetimes that are not bit-fields may have the same address if one is nested within the other, or if at least one is a subobject of zero size and they are of different types; otherwise, they have distinct addresses and occupy disjoint bytes of storage.

(emphasis mine) Here again, we have the problem that "occupies" is not defined and again I would argue to take the bytes in the object representation. Note that there is a footnote to this [basic.memobj]/footnote 29

Under the “as-if” rule an implementation is allowed to store two objects at the same machine address or not store an object at all if the program cannot observe the difference ([intro.execution]).

Which may allow the compiler to break this if it can prove that there is no observable side-effect. I would think that this is pretty complicated for such a fundamental thing like object layout. Maybe that is why this optimization is only taken when the user provides the info that there is no reason to have disjoint objects by adding the [no_unique_address] attribute.

tl;dr: Padding maybe part of the object and members have to be disjoint.


1 I could not resist adding a reference that occupy may mean to take up: Webster’s Revised Unabridged Dictionary, G. & C. Merriam, 1913 (emphasis mine)

  1. To hold, or fill, the dimensions of; to take up the room or space of; to cover or fill; as, the camp occupies five acres of ground. Sir J. Herschel.

What standard crawl would be complete without a dictionary crawl?

Community
  • 1
  • 1
n314159
  • 4,990
  • 1
  • 5
  • 20
  • 2
    The "occupy disjoint bytes of storage" part from into.storage would be enough, I think, for me - but this wording was only added in C++20 as part of the change that added `no_unique_address`. It leaves the situation prior to C++20 less clear. I didn't understand your reasoning leading to "No object can occupy the space of another if it is not a nested within it" from basic.life/1.5, in particular how to get from "the storage which the object occupies is released" to "no object can occupy the space of another". – BeeOnRope Jan 22 '20 at 22:58
  • 1
    I added a small clarification to that paragraph. I hope that makes it more understandable. Otherwise I will look at it again tomorrow, right now it is pretty late for me. – n314159 Jan 22 '20 at 23:53
  • _"Two objects with overlapping lifetimes that are not bit-fields may have the same address if one is nested within the other, or if at least one is a subobject of zero size and they are of different types"_ [2 objects with overlapping lifetimes, of the same type, have the same address](https://wandbox.org/permlink/Z2qSJMBs72mlfqjw). – Language Lawyer Jan 23 '20 at 15:24
  • 1
    Sorry, could you elaborate? You are quoting a standard quote from my answer and bringing an example that conflicts a bit with that. I am unsure if this is a comment on my answer and if it is what it should tell me. Regarding your example I would say that one had to consider still other parts of the standard (there is a paragraph about an unsigned char array providing storage for another object, something regarding zero sized base optimization and still further one should also look if placement new has special allowances, all things which I don't think are relevant to OPs example) – n314159 Jan 23 '20 at 15:35
  • @n314159 I think this wording might be defective. – Language Lawyer Jan 23 '20 at 17:08