6

Question> What is the recommended way to pass __int128_t as a function parameter?

Thank you

#include <iostream>
bool CheckInt(const __int128_t& large_number)
{
    return large_number > 10000; // Just for Demo
}

bool CheckInt2(__int128_t large_number)
{
    return large_number > 10000;
}

int main()
{
    __int128_t abc = 20000;    
    std::cout<< CheckInt(abc) << std::endl;
    std::cout<< CheckInt2(abc) << std::endl;

    return 0;
}
q0987
  • 34,938
  • 69
  • 242
  • 387
  • Which one was faster? – eerorika May 11 '22 at 15:36
  • Which compiler? – Jonathan S. May 11 '22 at 15:44
  • In your example, they're going to be inlined, and the difference evaporates as a result. – Ben Voigt May 11 '22 at 15:51
  • 1
    On my platform, aggregate parameter size on the stack versus a reference, the size < 256 is a win for pass-by-value, 256-512 is about the same, and >512 is a win for pass by reference. Those numbers are very platform specific, so you'd need to profile to see where the sweet spot is for your platform. – Eljay May 11 '22 at 15:51
  • 1
    It may depend on your platform, but at least on x86-64 and ARM64, `__int128` can be passed by value in two registers, avoiding memory entirely. So that should be better than passing by reference. Of course if the function is to be called so often that the parameter passing overhead is significant, then it probably should be inlined anyway. – Nate Eldredge May 11 '22 at 15:57
  • Print out the assembly language first. Some processors can use multiple registers for a 128 bit value. Let your compiler decide. – Thomas Matthews May 11 '22 at 15:57
  • @JonathanS. gcc 10.2.1 – q0987 May 11 '22 at 16:31
  • @BenVoigt, the implementation of that function is more complicate than the one shown above. – q0987 May 11 '22 at 16:31
  • @NateEldredge, this is centos 7 64bit. – q0987 May 11 '22 at 16:32
  • @q0987: You won't learn much about performance tradeoffs of your more complicated function by asking questions about a trivial one. – Ben Voigt May 11 '22 at 19:16

1 Answers1

9

Let's look at four scenarios.

These were compiled by gcc for an 64 bit x86 architecture, there should be similar results for different compilers.

  1. How the functions are compiled:
bool by_value(__int128 large_number) {
    return large_number > 10000;
}

bool by_reference(const __int128& large_number) {
    return large_number > 10000;
}

And we can see the x86 assembler output here https://godbolt.org/z/v9cM8xj35

by_value(__int128):
        mov     eax, 10000
        cmp     rax, rdi  # Use first 8 bytes
        mov     eax, 0
        sbb     rax, rsi  # Use second 8 bytes
        setl    al
        ret
by_reference(__int128 const&):
        mov     eax, 10000
        cmp     rax, QWORD PTR [rdi]    # Use first 8 bytes
        mov     eax, 0
        sbb     rax, QWORD PTR [rdi+8]  # Use second 8 bytes
        setl    al
        ret

The commented lines are the only lines that differ.

This is showing the calling convention of the platform: The first 8 bytes of arguments are stored in rdi, the second 8 bytes in rsi.

When you pass by value, large_number will be stored in these two registers, and can be used quickly and efficiently.

When you pass by reference, only one register is used to pass a pointer to the value (rdi), and to access the first 8 bytes the dereference QWORD PTR [rdi] is used, and the second 8 bytes with QWORD PTR [rdi+8] (some pointer arithmetic).

Passing by value will win out in most situations here. If you have a lot of arguments or local variables in your functions, the registers used to store large_number may "spill" onto the stack, so theoretically passing by value would need to do more work. But it would probably spill if there was a one-register pointer or a two-register 16-byte value, so there shouldn't be much difference in practice.


  1. Calling the function with an existing __int128 variable:
bool by_value(__int128);
bool by_reference(const __int128&);

extern __int128 x;

extern bool call_by_value() {
    return by_value(x);
}

extern bool call_by_reference() {
    return by_reference(x);
}

https://godbolt.org/z/7sT8b33Ez

call_by_value():
        mov     rdi, QWORD PTR x[rip]
        mov     rsi, QWORD PTR x[rip+8]
        jmp     by_value(__int128)
call_by_reference():
        mov     edi, OFFSET FLAT:x
        jmp     by_reference(__int128 const&)

It may look like more work needs to be done in the by-value case: To call by-reference, you only need to the address of x (OFFSET FLAT:x) into edi and call the function, whereas in the by-value case the value of x needs to be read into the two registers then the function can be called.

However, recall that by_reference will have to indirect through the pointer to use it. So the by reference is hiding the x[rip] and x[rip+8] inside the function, and there isn't much difference.


  1. Calling the function with some constant value (or something that optimizes to it):
bool call_by_value() {
    __int128 abc = 20000;
    return by_value(abc);
}

bool call_by_reference() {
    __int128 abc = 20000;
    return by_reference(abc);
}

https://godbolt.org/z/6jhEWfh6a

call_by_value():
        mov     edi, 20000  # Stores 2000 into the first register
        xor     esi, esi    # Stores 0 into the second register
        jmp     by_value(__int128)
call_by_reference():
        sub     rsp, 24
        mov     rdi, rsp  # Store current stack pointer (which will point to abc)
        mov     QWORD PTR [rsp], 20000  # Store first 8 bytes on stack
        mov     QWORD PTR [rsp+8], 0    # Store second 8 bytes on the stack
        call    by_reference(__int128 const&)
        add     rsp, 24
        ret

Calling by reference needs to do a lot: The value has to be allocated onto the stack and then a pointer to it is passed to the function.

Calling by value can just stores the value into the two registers and calls the function.


  1. Calling the function with a runtime calculated prvalue (here the "calculation" is just a copy)
bool call_by_value() {
    return by_value(+x);
}

bool call_by_reference() {
    return by_reference(+x);
}

https://godbolt.org/z/vqdGEeGY9

call_by_value():
        mov     rdi, QWORD PTR x[rip]
        mov     rsi, QWORD PTR x[rip+8]
        jmp     by_value(__int128)
call_by_reference():
        sub     rsp, 24
        movdqa  xmm0, XMMWORD PTR x[rip]  # Store the value of x into a 16 byte register
        mov     rdi, rsp                  # Store current stack pointer
        movaps  XMMWORD PTR [rsp], xmm0   # Write 16 bytes to the stack pointer
        call    by_reference(__int128 const&)
        add     rsp, 24
        ret

So to pass the result of a calculation, in the by-value case the calculation can directly be done on registers. In the by-reference case, the value needs to be calculated and then stored on to the stack and then a pointer needs to be passed.


There is one more issue: When you have extern bool by_reference(const __int128&);, and you don't have whole program optimisation or link time optimization, the compiler can't know that passing to by_reference doesn't modify the value it is passed. After all, it could look like:

bool by_reference(const __int128& large_number) {
    const_cast<__int128&>(large_number) = 0;
}

This can disable some further optimizations.


All in all, it is better in most cases to pass by value. On other architectures, the default calling convention may be to pass 16 byte arguments on the stack, which would make both cases not too different.

Some people will say that you should only pass something the size of a pointer or smaller by value, and everything else should be passed by reference. However, this fails to account for how much faster registers are than the stack.

This was based on the analysis of the assembler, not on actual timings. You would probably have to call a function many, many times for this to make a difference.

Artyer
  • 31,034
  • 3
  • 47
  • 75
  • The last - 1 paragraph says it all. – Michael Chourdakis May 11 '22 at 16:46
  • re: your answer on [Passing unique\_ptr compiles to more indirections than passing raw or even wrapped pointer](https://stackoverflow.com/q/72232861) - seems *I* was the one getting mixed up. https://godbolt.org/z/a8Tedjcre shows that with a non-trivial copy constructors, you do get the arg passed by reference in x86-64 SysV, rather than by value on the stack like the C ABI does for stuff too large to fit in registers. – Peter Cordes May 13 '22 at 18:54