29

C++ has a small-size struct calling convention optimization where the compiler passes a small-size struct in function parameters as efficiently as it passes a primitive type (say, via registers). For example:

class MyInt { int n; public: MyInt(int x) : n(x){} };
void foo(int);
void foo(MyInt);
void bar1() { foo(1); }
void bar2() { foo(MyInt(1)); }

bar1() and bar2() generate almost identical assembly code except for calling foo(int) and foo(MyInt) respectively. Specifically on x86_64, it looks like:

        mov     edi, 1
        jmp     foo(MyInt) ;tail-call optimization jmp instead of call ret

But if we test std::tuple<int>, it will be different:

void foo(std::tuple<int>);
void bar3() { foo(std::tuple<int>(1)); }

struct MyIntTuple : std::tuple<int> { using std::tuple<int>::tuple; };
void foo(MyIntTuple);
void bar4() { foo(MyIntTuple(1)); }

The generated assembly code looks totally different, the small-size struct (std::tuple<int>) is passed by pointer:

        sub     rsp, 24
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], 1
        call    foo(std::tuple<int>)
        add     rsp, 24
        ret

I dug a bit deeper, tried to make my int a bit more dirty (This should be close to an incomplete naive tuple impl):

class Empty {};
class MyDirtyInt : protected Empty, MyInt {public: using MyInt::MyInt; };
void foo(MyDirtyInt);
void bar5() { foo(MyDirtyInt(1)); }

but the calling convention optimization is applied:

        mov     edi, 1
        jmp     foo(MyDirtyInt)

I have tried GCC/Clang/MSVC, and they all showed the same behavior. (Godbolt link here) So I guess this must be something in the C++ standard? (I believe the C++ standard doesn't specify any ABI constraint, though?)

I'm aware that the compiler should be able to optimize these out, as long as the definition of foo(std::tuple<int>) is visible and not marked noinline. But I want to know which part of the standard or implementation causes the invalidation of this optimization.

FYI, in case you're curious about what I'm doing with std::tuple, I want to create a wrapper class (i.e. the strong typedef) and don't want to declare comparison operators (operator<==>'s prior to C++20) myself and don't want to bother with Boost, so I thought std::tuple was a good base class because everything was there.

YumeYao
  • 557
  • 4
  • 10
  • I suspect it's the `std::tuple`'s user-defined copy/move constructors that affect the behavior here. But I can't tell you what in the ABI it's interacting with here. – StoryTeller - Unslander Monica Sep 03 '20 at 08:05
  • Normally it’s a nontrivial destructor which causes this behaviour, but the `std::tuple` destructor should be trivial. – Konrad Rudolph Sep 03 '20 at 08:11
  • 1
    @KonradRudolph Anyway, adding user-defined destructor to `MyInt` has the same effect: https://godbolt.org/z/s4zzcx. – Daniel Langr Sep 03 '20 at 12:13
  • @DanielLangr Yes, that’s what I said. It seems that supplying a custom version of *any* of copy constructor, move constructor or destructor causes an ABI change. I guess this makes sense. – Konrad Rudolph Sep 03 '20 at 12:19
  • I think it worth mentioning that this is just another case of what Chandler Carruth talked about in CppCon 2019, "there is no zero-cost abstraction". His example was about `std::unique_ptr` but the cases are very similar. https://youtu.be/rHIkrotSwcc?t=1050 – Yehezkel B. Sep 03 '20 at 13:55
  • 1
    @YehezkelB. I am not sure this is the same case. With libc++, you get passing by registers. It seems to me more like a quality of implementation issue. – Daniel Langr Sep 03 '20 at 13:59
  • OK, not exactly the same, because `tuple` d-tor is trivial, while `unique_ptr`'s isn't, so in `tuple` case it depends on the way the implementation handles copy/move c-tors, but for `unique_ptr` there is no other option. But the reasoning behind it is still similar. – Yehezkel B. Sep 03 '20 at 14:22
  • 1
    duplicates: [Is returning a 2-tuple less efficient than std::pair?](https://stackoverflow.com/q/46901697/995714), [Why is std::pair faster than std::tuple](https://stackoverflow.com/q/26863852/995714) – phuclv Sep 03 '20 at 23:46
  • Does this answer your question? [Is returning a 2-tuple less efficient than std::pair?](https://stackoverflow.com/questions/46901697/is-returning-a-2-tuple-less-efficient-than-stdpair) – phuclv Sep 03 '20 at 23:46
  • 1
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71301 – Marc Glisse Sep 04 '20 at 05:47

2 Answers2

12

It seems to be a matter of ABI. For instance, the Itanium C++ ABI reads:

If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.

And, further:

A type is considered non-trivial for the purposes of calls if it has a non-trivial copy constructor, move constructor, or destructor, or all of its copy and move constructors are deleted.

The same requirement is in AMD64 ABI Draft 1.0.

For instance, in libstdc++, std::tuple has non-trivial move constructor: https://godbolt.org/z/4j8vds. The Standard prescribes both copy and move constructor as defaulted, which is satisfied here. However, at the same time, tuple inherits from _Tuple_impl and _Tuple_impl has a user-defined move constructor. Consequenlty, move constructor of tuple itself cannot be trivial.

On the contrary, in libc++, both copy and move constructors of std::tuple<int> are trivial. Therefore, the argument is passed in a register there: https://godbolt.org/z/WcTjM9.

As for Microsoft STL, std::tuple<int> is trivially neither copy-constructible nor move-constructible. It even seems to break the C++ Standard rules. std::tuple is defined recursively and, at the end of recursion, std::tuple<> specialization defines non-defaulted copy constructor. There is a comment about this issue: // TRANSITION, ABI: should be defaulted. Since tuple<> has no move constructor, both copy and move constructors of tuple<class...> are non-trivial.

Daniel Langr
  • 22,196
  • 3
  • 50
  • 93
  • 1
    So is `libstdc++` non-compliant here? [gasp] – underscore_d Sep 03 '20 at 15:43
  • @underscore_d Why? The Standard only says that the move constructor of `std::tuple` must be defined as defaulted. Which is satisfied by libstdc++. – Daniel Langr Sep 03 '20 at 17:20
  • Your 2nd sentence contradicts your own post. – Maxim Egorushkin Sep 03 '20 at 18:04
  • @MaximEgorushkin Don't understand. In libstdc++, `std::tuple` has defaulted move constructor, according to the Standard. At the same time, `std::tuple` inherits from `_Tuple_impl`. Since `_Tuple_impl` has user-defined move constructor, the move constructor of `std::tuple` isn't trivial. Even if it's defined as defaulted. – Daniel Langr Sep 03 '20 at 18:13
  • 1
    @DanielLangr Oh, you are saying those constructors are defaulted in class `std::tuple` indeed, but not in its base classes. My mistake. – Maxim Egorushkin Sep 03 '20 at 18:38
  • 1
    @MaximEgorushkin I reworded the explanation in the answer to make it hopefully more clear in this context :). – Daniel Langr Sep 03 '20 at 18:42
  • I wonder why `stdlibc++` doesn't `default` the copy and move constructors of the base classes? Make it `constexpr` and conditionally `noexcept`, fine, but `default` the implementation. And now this would be an ABI breaking change. As it stands now it is making mockery out of the standard. – Maxim Egorushkin Sep 03 '20 at 18:58
  • 3
    @MaximEgorushkin It even seems that a [relevant patch has been proposed](https://patchwork.ozlabs.org/project/gcc/patch/alpine.DEB.2.02.1605232038220.30609@laptop-mg.saclay.inria.fr/), but not accepted because of breaking backward compatibility regarding calling conventions. – Daniel Langr Sep 03 '20 at 19:37
  • I thought the failure of such optimization should be related to std library implementation, so I switched between compilers on godbolt. But I never knew clang on godbolt was using libstdc++ by default and in order to use libc++ the corresponding argument must be provided. – YumeYao Sep 04 '20 at 01:10
  • 4
    ABI stability fundamentalists are ruining C++ once again. :-( – Konrad Rudolph Sep 04 '20 at 07:58
4

As suggested by @StoryTeller it might be related to a user defined move constructor inside std::tuple that causes this behavior.

See for example: https://godbolt.org/z/3M9KWo

Having user defined move constructor leads to the non-optimized assembly:

bar_my_tuple():
        sub     rsp, 24
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], 1
        call    foo(MyTuple<int>)
        add     rsp, 24
        ret

In libcxx for example the copy and move constructors are declared as default both for tuple_leaf and for tuple, and you get the small-size struct call convention optimization for std::tuple<int> but not for std::tuple<std::string> which is holding a non trivially moveable member and thus becomes naturally non trivially moveable by itself.

Amir Kirsh
  • 12,564
  • 41
  • 74
  • 1
    You need to tell the compiler to use libc++: https://godbolt.org/z/WcTjM9. By default, clang on Compiler Explorer uses libstdc++. – Daniel Langr Sep 03 '20 at 12:58
  • @DanielLangr hey that's great! it actually presents that for libc++ `std::tuple` is move constructible and thus it has the small-size struct call convention optimization! – Amir Kirsh Sep 03 '20 at 13:03