Inline member operators vs inline operators C++

Question

If I have two structs:

struct A
{
    float x, y;
    inline A operator*(A b) 
    {
        A out;
        out.x = x * b.x;
        out.y = y * b.y;
        return out;
    } 
}

And an equivalent struct

struct B
{
    float x, y;
}

inline B operator*(B a, B b) 
{
    B out;
    out.x = a.x * b.x;
    out.y = a.y * b.y;
    return out;
}

Would you know of any reason for B's operator* to compile any differently, or run any slower or faster than A's operator* (the actual actions that go on inside the functions should be irrelevant)?

What I mean is... would declaring the inline operator as a member, vs not as a member, have any generic effect on the speed of the actual function, whatsoever?

I've got a number of different structs that currently follow the inline member operator style... But I was wanting to modify it to be valid C code, instead; so before I do that I wanted to know if there would be any changes to performance/compilation.

You should read http://stackoverflow.com/questions/4421706/operator-overloading which shows techniques faster than either of these. — Ben Voigt, May 20 '12 at 03:35

Mike DeSimone · Accepted Answer · 2012-05-20T15:40:04.800

The way you have it written, I'd expect B::operator* to run slightly slower. This is because the "under the hood" implementation of A::operator* is like:

inline A A::operator*(A* this, A b) 
{ 
    A out;
    out.x = this->x * b.x;
    out.y = this->y * b.y;
    return out;
}

So A passes a pointer to its left-hand-side argument to the function, while B has to make a copy of that parameter before calling the function. Both have to make copies of their right-hand-side parameters.

Your code would be much better off, and probably would implement the same for A and B, if you wrote it using references and made it const correct:

struct A
{
    float x, y;
    inline A operator*(const A& b) const 
    {
        A out;
        out.x = x * b.x;
        out.y = y * b.y;
        return out;
    } 
}

struct B
{
    float x, y;
}

inline B operator*(const B& a, const B& b) 
{
    B out;
    out.x = a.x * b.x;
    out.y = a.y * b.y;
    return out;
}

You still want to return objects, not references, since the results are effectively temporaries (you're not returning a modified existing object).

Addendum

However, with the const pass-by-reference for both arguments, in B, would it make it effectively faster than A, due to the dereferencing?

First off, both involve the same dereferencing when you spell out all the code. (Remember, accessing members of this implies a pointer dereference.)

But even then, it depends on how smart your compiler is. In this case, let's say it looks at your structure and decides it can't stuff it in a register because it's two floats, so it will use pointers to access them. So the dereferenced pointer case (which is what references get implemented as) is the best you'll get. The assembly is going to look something like this (this is pseudo-assembly-code):

// Setup for the function. Usually already done by the inlining.
r1 <- this
r2 <- &result
r3 <- &b

// Actual function.
r4 <- r1[0]
r4 <- r4 * r3[0]
r2[0] <- r4
r4 <- r1[4]
r4 <- r4 * r3[4]
r2[4] <- r4

This is assuming a RISC-like architecture (say, ARM). x86 probably uses less steps but it gets expanded to about this level of detail by the instruction decoder anyway. The point being that it's all fixed-offset dereferences of pointers in registers, which is about as fast as it will get. The optimizer can try to be smarter and implement the objects across several registers, but that kind of optimizer is a lot harder to write. (Though I have a sneaking suspicion that an LLVM-type compiler/optimizer could do that optimization easily if result were merely a temporary object that is not preserved.)

So, since you're using this, you have an implicit pointer dereference. But what if the object were on the stack? Doesn't help; stack variables turn into fixed-offset dereferences of the stack pointer (or frame pointer, if used). So you're dereferencing a pointer somewhere in the end, unless your compiler is bright enough to take your object and spread it across multiple registers.

Feel free to pass the -S option to gcc to get a disassembly of the final code to see what's really happening in your case.

Thanks, Mike... I always forget about using const, or pass-by-reference. My mistake... However, with the const pass-by-reference for both arguments, in B, would it make it effectively faster than A, due to the dereferencing? — Serge, May 20 '12 at 03:32
@Stefan: No, because both involve dereferencing (one dereferences the `this` pointer, the other an explicit reference argument, but the cost is the same). — Ben Voigt, May 20 '12 at 03:34
If the compiler can prove that a pass-by-reference/argument is a load only argument it may optimize the pointer/reference to a pass-by-value. — dirkgently, May 20 '12 at 03:36

score 4 · Answer 2 · answered May 20 '12 at 03:31

4

You really should leave inline-ing to the compiler.

That said, functions defined within the class definition (as is the case with A) are inline by default. The inline specifier for A::operator * is useless.

The more interesting case is when you have the member function definition outside of the class definition. Here, inline is required if you would like to provide a hint to the compiler (which it may ignore at will) that this is oft-used and the instructions should be compiled in-line within the caller.

Read the C++ FAQ 9.

answered May 20 '12 at 03:31

dirkgently

108,024
16
131
187

Also remember that the compiler (maddeningly) is allowed to ignore `inline` when it feels like it. So you might also want to declare `inline` nonmember functions as `static` in case it does this to you, so you don't get multiply-defined-symbol errors at link time. (I say "maddeningly" because this creates a situation I have hit where `#define` can force inlining while an `inline` function cannot, and it turned out the code using `inline` was actually larger and required more stack. Nobody thinks of the embedded guys.) – Mike DeSimone May 20 '12 at 17:21
Please note that I said "declare `inline` **nonmember** functions as `static`". The `static` keyword, unfortunately, has wildly different meanings in the two contexts. My point was *solely* about avoiding multiply-defined-symbol errors because the compiler decided not to inline a **nonmember** function. Also, I have encountered this problem in practice, so not all compilers are standard-conforming in the way you describe. – Mike DeSimone May 20 '12 at 18:03

score 2 · Answer 3 · answered May 20 '12 at 03:46

Here is how I would write the struct:

struct A
{
    float x, y;
    A(float ax, float ay) : x(ax), y(ay) { }
    A operator*(const A& b) const { return b(x * b.x, y * b.y); } 
}

To answer the question, yes writing an operator as a member function can be ever so slightly faster in certain circumstances, but not enough to make a noticeable difference in your code.

Some notes:

Never worry about using the inline keyword. Optimizing compilers make their own decisions about what and what not to inline.
Use initializing constructors. Do it because they improve code readability. Sleep better knowing that they can bring small performance benefits.
Pass structs by const reference as often as possible.
Focus on writing code that has good style not fast. Most code is fast enough, and if it isn't it is probably because of something boneheaded in the algorithms or handling of IO.

Could you perhaps explain what circumstances would cause the potential (tiny) increase in speed? If not, that's ok. — Serge, May 20 '12 at 03:50
Some compilers in some conditions will favor passing the "this" pointer around by register BUT that would just mean some other assembly code near the function call was a bit slower. So: don't worry about it. — cdiggins, May 20 '12 at 03:57

Inline member operators vs inline operators C++

3 Answers3

Linked