3

I have to create two versions of the same function: one with all parameters listed, one with parameters passed as a struct. The number of parameters is arbitrary. I implement the functionality in only one of them, the other is just calling it with expanded parameters or initialized structure.

Is there a difference in the overhead between the two versions below?

Version 1

int functionWithStructure(MyStructure a)
{
    return functionWithMultipleParams(a.Myparam1, a.Myparam2);
}

int functionWithMultipleParams(int param1, int param2)
{
    return /* implement something */;
}

Version 2

int functionWithMultipleParams(int param1, int param2)
{
    return functionWithStructure((MyStructure) {param1, param2});
}

int functionWithStructure(MyStructure a)
{
    return /* implement something */;
}
Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
G. B.
  • 528
  • 2
  • 15
  • I believe a good optimizer will eliminate the difference between the versions. But as usual with optimizations: you must profile in your specific case. Did you try to measure ? – wohlstad Jun 13 '23 at 14:38
  • 3
    Do you really need to care? Most likely not. If you, measure it. This is something that will be different across different systems / architecture. – 12431234123412341234123 Jun 13 '23 at 14:39
  • @wohlstad The compiler can't optimize too much, assuming this aren't `static` functions. The mapping of the `struct` has to be consistent and the calling convention too, so the compiler is maybe forced to use less than ideal passing of the parameters. However, the compiler may add a 3. "function" that is "called" (jumped to) from both of this functions. – 12431234123412341234123 Jun 13 '23 at 14:41
  • @12431234123412341234123 what about inlining it all (or maybe I'm missing something here) ? But anyway - my main point was that it's crucial to measure when dealing with optimizations. And I join your comment about whether this specific area is really affecting overall performance. This is also something that requires profiling. – wohlstad Jun 13 '23 at 14:44
  • 3
    @G. B., The difference is [not important](https://softwareengineering.stackexchange.com/q/80084/94903). Code for clarity. – chux - Reinstate Monica Jun 13 '23 at 14:47
  • 1
    I would use a pointer to the `struct` and hide the definition of the `struct` inside the c file, only add a declaration (`struct MyStructure;`) in the header file. The reason is that you can extend the structure without breaking binary compatibility and you encapsulate better. The performance difference is irrelevant 99% of the time. – 12431234123412341234123 Jun 13 '23 at 15:01
  • 1
    @12431234123412341234123 I'd be careful with that statement. Passing small and common structs like pairs of `int` via opaque pointer can really add up throughout the program. For example, `std::unique_ptr` not being passable via register can have a 10% performance impact in some applications. In general, this needs to be profiled and decided case-by-case. – Jan Schultke Jun 13 '23 at 19:41
  • @JanSchultke But an opaque `struct` can only be passed by address, so that wouldn't apply to an API. Internally, though, where the `struct` definition is visible, passing `struct`s by value is possible, so your concerns would be applicable there. – Andrew Henle Jun 13 '23 at 20:08
  • @JanSchultke What is `std::unique_ptr`? As far as i know, this is not a valid syntax. A pointer can be passed via a single register on most architectures (not on all). I prefer readability and extendability over a 10% performance gain in 99% of cases. – 12431234123412341234123 Jun 14 '23 at 09:19
  • @12431234123412341234123 see https://stackoverflow.com/q/16894400/5740428 for `std::unique_ptr`. It cannot be passed via register because it is not trivially destructible, which is a requirement in the System-V ABI to be passed by register. – Jan Schultke Jun 14 '23 at 09:51
  • @JanSchultke The Linked question is about a different programming language (C++). That doesn't make `std::unique_ptr` valid C (the language used for this question) code. Saying something should not be done in programming language A because programming language B has feature X, which isn't supported in language A, which shouldn't be used in this situation, is nonsensical. – 12431234123412341234123 Jun 14 '23 at 09:53
  • @12431234123412341234123 it is still a good anecdote, because `std::unique_ptr` not being passable by register means that we're passing it via its address in the ABI unnecessarily. Measuring the performance impact of doing so gives us an idea what the cost of passing `T*` around compared to `T` is for small objects. – Jan Schultke Jun 14 '23 at 09:56
  • for clarification: the question is about c, as specified in the tags – G. B. Jun 15 '23 at 10:07

1 Answers1

3

You can't say that one version is always better than the other. Sometimes it is better to pack parameters into a struct, and sometimes it is worse.

In the x86_64 ABI, there is a difference between passing 2x int and a single struct parameter.

  • in the former case, each int is passed via a separate register edi, esi
  • in the latter case, the struct members are packed into a single register rdi

As a rule of thumb, a struct is better when we perform operations with the whole struct (like passing it to other functions), whereas separate parameters are better when using them in separate ways.

Positive Cost struct

struct point {
    int x;
    int y;
};

int sum(int x, int y) {
    return x + y;
}

int struct_sum(struct point p) {
    return p.x + p.y;
}

Which produces: (GCC 13 -O2)

sum:
        lea     eax, [rdi+rsi]
        ret
struct_sum:
        mov     rax, rdi
        shr     rax, 32
        add     eax, edi
        ret

You can see that sum simply computes the sum of rdi and rsi, whereas struct_sum first has to unpack the operands into separate registers, since they both start in rdi.

Negative Cost struct

struct point {
    int x;
    int y;
};

struct point lowest_bit(int x, int y) {
    return (struct point) {x & 1, y & 1};
}

struct point struct_lowest_bit(struct point p) {
    return (struct point) {p.x & 1, p.y & 1};
}

Which procudes: (clang trunk -O2)

lowest_bit:
        and     edi, 1
        and     esi, 1
        shl     rsi, 32
        lea     rax, [rdi + rsi]
        ret
struct_lowest_bit:
        movabs  rax, 4294967297
        and     rax, rdi
        ret

Note: GCC doesn't find this optimization for some reason.

In this case, it's better for both members to be packed into rdi, because performing & 1 with either one of them can be parallelized this way.


Also see: C++ Weekly - Ep 119 - Negative Cost Structs (C++ video, but equally applies to C due to similar ABI).

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96