3

I have several classes that are based on the PIMPL idiom (where a unique_ptr refers to the actual implementation struct).

I haven't added a friend swap function (as described here) as, to my knowledge, the standard std::swap uses move-semantics which would nicely swap out the unique_ptrs. So far, so good.

However, I read (the somewhat outdated Effective C++ from Scott Meyers that says in Item 25:

However, the default swap implementation might not thrill you. It involves copying three objects: a to temp, b to a, and temp to b. [...] For some types, the default swap puts you on the fast track to the slow lane. Foremost among such types are those consisting primarily of a pointer to another type that contains the real data. A common manifestation of this design is the "pimpl" idiom.

After which he also suggest to specialize std::swap as well.

My question is whether this still holds in C++11. It seems that the C++11 swap works just fine for pimpl'd classes. I understand that adding a friend swap allows the STL to use argument dependent lookup and so on, but I prefer to keep my classes as lean as possible.

allyourcode
  • 21,871
  • 18
  • 78
  • 106
Ben
  • 1,519
  • 23
  • 39

2 Answers2

3

My question is whether this still holds in C++11.

Only to a much lesser degree.

Since introduction of move semantics in C++11, the generic swap no longer copies, but moves instead.

Moving is often much closer to optimal swap implementation so much so that one often doesn't need to bother writing a custom implementation. Although it may be close to optimal, in many cases (including the PIMPL case, as demonstrated by DanielLangr) a custom implementation can be better. Whether it is sufficiently much faster to be beneficial to write a custom one can be determined by measuring the performance.

allyourcode
  • 21,871
  • 18
  • 78
  • 106
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • 1
    I would argue that a custom implementation can be even _significantly_ more optimal in some cases (e.g., see my answer). – Daniel Langr May 19 '20 at 13:48
  • 1
    @DanielLangr Relative difference may be significant, but I would still expect the generic swap to be very fast. It just isn't very, very fast. Whether this makes a difference depends a lot on how much of the time you are going to spend swapping. – eerorika May 19 '20 at 14:07
2

The problem here might be that with PIMPL implemented by std::unique_ptr, you basically need to define move constructor/assignment operator and destructor outside of the header file (see Item 22 of Meyers' Effective Modern C++). Then, std::swap does not "see" these definitions and a compiler cannot optimize-away unnecessary operations such as settings of null pointer that won't be used anymore, calling of operator delete with null pointer argument, etc. It will just generate 4 call instructions, since it has no other option.

Consider a simple demo for this class:

class X
{
    public: 
        X(X&&);
        X& operator=(X&&);
        ~X();

        void swap(X& other) { std::swap(pimpl_, other.pimpl_); }

    private:
        class Impl;
        std::unique_ptr<Impl> pimpl_;
};

The assembly generated by GCC with -O3 for swapping of two X objects by using std::swap was as follows:

f1(X&, X&):
        push    r12
        mov     r12, rdi
        push    rbp
        mov     rbp, rsi
        mov     rsi, rdi
        sub     rsp, 24
        lea     rdi, [rsp+8]
        call    X::X(X&&)
        mov     rsi, rbp
        mov     rdi, r12
        call    X::operator=(X&&)
        lea     rsi, [rsp+8]
        mov     rdi, rbp
        call    X::operator=(X&&)
        lea     rdi, [rsp+8]
        call    X::~X() [complete object destructor]
        add     rsp, 24
        pop     rbp
        pop     r12
        ret
        mov     rbp, rax
        jmp     .L2
f1(X&, X&) [clone .cold]:
.L2:
        lea     rdi, [rsp+8]
        call    X::~X() [complete object destructor]
        mov     rdi, rbp
        call    _Unwind_Resume

While the assembly generated for the same operation by using X::swap was:

f2(X&, X&):
        mov     rax, QWORD PTR [rdi]
        mov     rdx, QWORD PTR [rsi]
        mov     QWORD PTR [rdi], rdx
        mov     QWORD PTR [rsi], rax
        ret

The latter is obviously optimal, since it involves only instructions necessary for swapping two regular pointers (hidden behind std::unique_ptr in our case).


Moreover, even for empty X::Impl class, the generated assembly of the involved special member functions has many instructions:

X::X(X&&):
        mov     rax, QWORD PTR [rsi]
        mov     QWORD PTR [rdi], rax
        mov     QWORD PTR [rsi], 0
        ret
X::operator=(X&&):
        push    r12
        mov     r12, rdi
        mov     rax, QWORD PTR [rsi]
        mov     QWORD PTR [rsi], 0
        mov     rdi, QWORD PTR [rdi]
        mov     QWORD PTR [r12], rax
        test    rdi, rdi
        je      .L7
        mov     esi, 1
        call    operator delete(void*, unsigned long)
.L7:
        mov     rax, r12
        pop     r12
        ret
X::~X() [base object destructor]:
        mov     rdi, QWORD PTR [rdi]
        test    rdi, rdi
        je      .L12
        mov     esi, 1
        jmp     operator delete(void*, unsigned long)
.L12:
        ret
Daniel Langr
  • 22,196
  • 3
  • 50
  • 93