1

In both x86-64 GCC 13.1 and Clang 16.0.0, the copy<PrivateBase> function uses member-wise copy, while the copy<PublicBase> function uses bit-wise copy. You could refer to the detailed source code and assembly code on the compiler explorer or see the code snippets provided below:

class PublicBase {
public:
    int num;
    char c1;
};

class PrivateBase {
private:
    int num;
    char c1;
};


template<typename T>
__attribute_noinline__ void copy(T *dst, T *src) {
    *dst = *src;
}

template void copy(PublicBase *dst, PublicBase *src);
template void copy(PrivateBase *dst, PrivateBase *src);
void copy<PublicBase>(PublicBase*, PublicBase*):
        mov     rax, QWORD PTR [rsi]
        mov     QWORD PTR [rdi], rax
        ret
void copy<PrivateBase>(PrivateBase*, PrivateBase*):
        mov     eax, DWORD PTR [rsi]
        mov     DWORD PTR [rdi], eax
        movzx   eax, BYTE PTR [rsi+4]
        mov     BYTE PTR [rdi+4], al
        ret

The question is, when does the default copy assignment operator from C++11 use bit-wise copy instead of member-wise copy? It seems that neither is_trivially_copyable nor is_pod provides the answer.

is_trivially_copyable

According to cppreference-is_trivially_copyable:

Objects of trivially-copyable types that are not potentially-overlapping subobjects are the only C++ objects that may be safely copied with std::memcpy.

Both PublicBase and PrivateBase are trivially copyable and not subobjects, but PrivateBase is copied with member-wise instead of bit-wise.

is_pod

If there is a derived class of PublicBase or PrivateBase, the derived class of PrivateBase will reuse the padding of the base class, while that of PublicBase won't.

Therefore, it is reasonable that PrivateBase is copied with member-wise. Otherwise, the padding of base class may overwrite PrivateDerived::c2 when calling copy<PrivateBase>(derived, base).


class PublicDerived : public PublicBase {
public:
    char c2;
};

class PrivateDerived : public PrivateBase {
private:
    char c2;
};


int main() {
    std::cout << "sizeof(PublicBase)=" << sizeof(PublicBase) << std::endl;
    std::cout << "sizeof(PublicDerived)=" << sizeof(PublicDerived) << std::endl;
    std::cout << "sizeof(PrivateBase)=" << sizeof(PrivateBase) << std::endl;
    std::cout << "sizeof(PrivateDerived)=" << sizeof(PrivateDerived) << std::endl;

    return 0;
}
// Output:
// sizeof(PublicBase)=8
// sizeof(PublicDerived)=12
// sizeof(PrivateBase)=8
// sizeof(PrivateDerived)=8

I am confused about how the compiler decides to reuse padding of the base class or not. According to the related question, the POD type doesn't reuse padding of the base class.

According to the cppreference-POD_class:

A POD class is a class that

  • until C++11:
    • is an aggregate (no private or protected non-static data members),
    • has no user-declared copy assignment operator,
    • has no user-declared destructor, and
    • has no non-static data members of type non-POD class (or array of such types) or reference.
  • since C++11
    • is a trivial class,
    • is a standard-layout class (has the same access control for all non-static data members), and
    • has no non-static data members of type non-POD class (or array of such types).

before C++11, PrivateBase is not POD type (because it has private data members), but since C++11, it becomes POD type (because it has the same access control for all non-static data members).


int main() {
    std::cout << "PublicBase: is_standard_layout=" << is_standard_layout<PublicBase>::value
              << ", is_trivial=" << is_trivial<PublicBase>::value
              << ", is_pod=" << is_pod<PublicBase>::value << std::endl;

    std::cout << "PrivateBase: is_standard_layout=" << is_standard_layout<PrivateBase>::value
              << ", is_trivial=" << is_trivial<PrivateBase>::value
              << ", is_pod=" << is_pod<PrivateBase>::value << std::endl;
}
// Output:
// PublicBase: is_standard_layout=1, is_trivial=1, is_pod=1
// PrivateBase: is_standard_layout=1, is_trivial=1, is_pod=1
Zihe Liu
  • 159
  • 1
  • 12
  • 2
    In standard C++, there are no guarantees at all because you cannot understand what is happening from inside the program without inspecting the assembly. Compiler can do whatever it wants under the "as-if rule", it can even emit a C++ interpreter with the source code embedded. However, the question still makes sense for specific version of Clang. – yeputons Jun 25 '23 at 13:47

1 Answers1

5

The question is, when does the default copy assignment operator from C++11 use bit-wise copy instead of member-wise copy? It seems that neither is_trivially_copyable nor is_pod provides the answer.

First minor correction on terminology: You probably mean the implicitly-defined copy assignment operator. This is different from implicitly-declared copy assignment operator and defaulted or explicitly-defaulted copy assignment operator.

The implicitly-defined copy assignment operator always uses member-wise copy, except for unions, for which the object representation is copied instead (i.e. byte-wise as if by memcpy).

However, the value of padding is unspecified, so that the compiler doesn't need to care about overwriting it if it knows that it is indeed only padding, i.e. not reused for derived classes members.

Then, if the compiler knows that the assignment operator is equivalent to copying the members' object representations directly, e.g. if the copy assignment operator is trivial, then it can replace the member-wise copy by a copy of the object representation of the whole object. This wouldn't affect any observable behavior since the only difference, the resulting padding values, are unspecified anyway. Even if the copy assignment is not trivial, the compiler might see e.g. after inlining that the observable behavior wouldn't be affected by this optimization. Anything is permitted as long as the observable behavior doesn't change to one that wasn't permitted on the abstract machine ("as-if" rule).

I am confused about how the compiler decides to reuse padding of the base class or not. According to the related question, the POD type doesn't reuse padding of the base class.

This is not specified by the standard. It is up to the compiler to decide under which circumstances padding is reused and that does not need to coincide with the POD property. In fact the POD concept is deprecated and not used by current standard versions any more except for the deprecated is_pod type trait.

Even more so, the standard says that every base class subobject is potentially-overlapping. This property is used to define whether copying a trivially-copyable object by memcpy is permitted and because every base class subobject is potentially-overlapping, the standard, in theory, allows the tail padding of any class to be reused. Obviously this will however mess up C compatibility for class types that are also valid C structs, so a compiler isn't going to be that aggressive.

Because reuse of padding affects ABI compatibility between translation units, there will however be a general rule that the compiler will follow to maintain binary compatibility between translation units. Usually there is an ABI specification for the compiler/platform combination.

GCC and Clang follow the Itanium C++ ABI, which specifies the concept of POD for the purpose of layout which explicitly is based on the POD definition from the C++03 standard, excluding some special cases and with some clarifications. This concept, not the C++ standard's concept of "POD", is used to decided whether tail-padding is reused in the Itanium C++ ABI.

In C++03 PublicBase was POD, but PrivateBase wasn't, and so the former is POD for the purpose of layout, while the latter isn't. Consequently tail padding is reused only for the latter by GCC and Clang.

When tail padding is potentially reused the compiler can't copy the whole object representation for the implicit copy assignment operator because that would potentially modify a byte of a derived classes member as you already noticed, which would potentially affect the observable behavior and therefore would not be covered under "as-if".

user17732522
  • 53,019
  • 2
  • 56
  • 105