Why do compilers (e.g. gcc) deal with the memory layout of derived classes in this way?

Question

Here is my cpp code.

#include <iostream>
using namespace std;

class A {
public:
    int val;
    char a;
};

class B: public A {
public:
    char b;
};

class C: public B {
public:
    char c;
};

int main()
{
    cout << sizeof(A) << endl;
    cout << sizeof(B) << endl;
    cout << sizeof(C) << endl;

    return 0;
}

The output of the program (in gcc) is:

8
12
12

This output confuses me a lot.

I know that the alignment may be the reason why sizeof(A) equals to 8. (sizeof(int) + sizeof(char) + 3 bytes padding)

And I also guess that the expansion of sizeof(B) (sizeof(B) == sizeof(A) + sizeof(char) + 3 bytes padding) is to avoid overlap when copy occurs. (is that right?)

But what I really don't know why sizeof(B) is equal to sizeof(C).

Thanks a lot.

The sizes, including the paddings, are in bytes, not in bits. — eerorika, May 24 '14 at 06:32
Not related to runtime-debug prints, a VC++ compiler switch that ponies up the actual object-structural layout, vtables, virtual-bases, et-al, is incredibly educational. [See this question](http://stackoverflow.com/questions/2138890/layout-of-compiled-objects) for details on how it is done for that platform. I cannot say with experience whether something similar exists for g++, but I would be somewhat surprised if it did *not*. — WhozCraig, May 24 '14 at 07:01
It might be instructive to print the offsets of the variables with `cout << "Offset of 'val': " << (int)(&((C*)0)->val) << " bytes.\n";` etc. — cmaster - reinstate monica, May 24 '14 at 07:43

pentadecagon · Accepted Answer · 2014-05-24T10:20:20.297

13

Both GCC and Clang follow the Itanium C++ ABI document, which specifies:

... implementations may freely allocate objects in the tail padding of any class which would not have been POD in C++98

class A is POD, so the compiler cannot put stuff into its padding. class B isn't POD, so the compiler is free to re-use the padding within the base class layout for members of derived objects. The basic idea here was that the C++ class layout should mirror the equivalent C struct layout for POD types, but there is no limitation for other classes. Because the meaning of "POD" has changed multiple times, they explicitly use the definition from C++98.

EDIT: About the rationale. POD-types are very simple classes that could be implemented as struct in C. For those types the layout should be identical to the layout a C compiler would create. In particular they want to allow C-tools like memcpy for A. If char b; were within the padding of A, memcpy would destroy it.

edited May 24 '14 at 10:20

answered May 24 '14 at 07:11

pentadecagon

4,717
2
18
26

Thanks for your answer. But can you explain what is this rule made for ? – Wizmann May 24 '14 at 09:37
I added some text to better explain the rationale. Hope this helps. – pentadecagon May 24 '14 at 09:52
What is the difference between ``B`` and ``C`` which makes ``C`` non-POD with respect to C++98? Both are publicly inheriting from a POD-class and containing only POD members. Or wait, is ``B`` already non-POD? In that case I just misunderstood the quote from the standard (in that case, some more context and clarification of your edit would be helpful). – Jonas Schäfer May 24 '14 at 09:54
@Jonas `A` is POD, as a result the padding within `A` must not be used. `B` is non-POD, so the padding within `B` is fair game for subsequent objects. But see the problem. Better now? – pentadecagon May 24 '14 at 10:09
So ``B`` is not POD, yet the compiler creates a POD-like layout? (being a bit confused here) – Jonas Schäfer May 24 '14 at 10:10
@Jonas No, it has nothing to do with `B`, it's the `A` they want to protect. They want to allow ugly C-tools like `memcpy` for `A`. – pentadecagon May 24 '14 at 10:17
Ah now I get it. I didn’t realize that for protecting the layout of ``A``, you might want to keep the padding clear of data. Thanks for your effort to explain that to me! – Jonas Schäfer May 24 '14 at 10:18
@jonas: The rule is more general. The derived class can never change the layout of the base class. The consequences would be horrible. So `c` cannot change the offset of `b` either. Only the tail padding is usable. – david.pfx May 24 '14 at 10:37
Yeah, that makes sense, @david.pfx. I just had a hard time to realize why it makes sense to even protect the tail padding of ``A`` here (so that in C, you can ``memcpy`` over the ``A``-part of a ``B`` struct). – Jonas Schäfer May 24 '14 at 10:40
@david.pfx Even more general: derived classes can change the layout of base classes only if there are virtual base classes within the mix. – pentadecagon May 24 '14 at 10:43
@pentadecagon: I don't see how. The offset of each member within its class must be preserved regardless of subsequent derivations. Can you provide an example that shows otherwise? – david.pfx May 24 '14 at 11:18
@pentadecagon thank you very much for your great great answer. I learn a lot. :) – Wizmann May 24 '14 at 12:15
@pentadecagon: By layout I mean the offset and size of each member within its enclosing struct/class. That doesn't change, and is usually quite predictable, even if not a _standard-layout-class_. Your example shows something different: offset of an inherited member. For non-SLC base class that's UB and for virtual not easily predictable either. – david.pfx May 25 '14 at 01:06
Re POD: actually the standard uses the term _standard-layout-class_. An SLC may have non-trivial constructors and certain operators, but otherwise behaves as the POD described here. POD is more restrictive. – david.pfx May 25 '14 at 01:10

Why do compilers (e.g. gcc) deal with the memory layout of derived classes in this way?

1 Answers1