Binary Operators Return Xvalue Instead of PRvalue?

Question

According to this blog -- which I realize is old, if it is no longer considered relevant please let me know -- the best method for implementing binary operators is the following...

// The "usual implementation"
Matrix operator+(Matrix const& x, Matrix const& y)
{ Matrix temp = x; temp += y; return temp; }

// --- Handle rvalues ---

Matrix operator+(Matrix&& temp, const Matrix& y)
{ temp += y; return std::move(temp); }

Matrix operator+(const Matrix& x, Matrix&& temp)
{ temp += x; return std::move(temp); }

Matrix operator+(Matrix&& temp, Matrix&& y)
{ temp += y; return std::move(temp); }

I tested this implementation, and in expressions like the following...

a + b + c + d

Where they are all matrices, I ended up with many move constructor and destructor calls that I don't believe are necessary. If the return type on all the operator+ taking an rvalue matrix were changed to Matrix&&, you eliminate all the move the constructors, and need only a single destructor call.

I made a simple program to show both implementations with code here.

Could anyone explain if doing this is wrong / bad, and why? I can't think of a reason why not to do it this way. It saves many constructor and destructor calls, and doesn't seem to break anything.

What would `operator+(Matrix const& x, Matrix const& y)` return? Not a reference to `temp`, I hope. Also, if someone writes `Matrix&& x = a + b + c; DoSomething(x);` they are gonna have a bad time; `x` becomes a dangling reference soon after it's initialized. — Igor Tandetnik, Mar 06 '14 at 22:25
this helps http://stackoverflow.com/questions/11726171/numeric-vector-operator-overload-rvalue-reference-parameter ? — user2485710, Mar 06 '14 at 22:26
@IgorTandetnik can an lvalue-reference be initialized from an rvalue reference? — Jay, Mar 06 '14 at 22:27
@Jay: I fixed it soon after I posted it. You beat me to it. Admittedly, that's a pretty far-fetched scenario. — Igor Tandetnik, Mar 06 '14 at 22:29
operator+(Matrix const& x, Matrix const& y) still returns the Lvalue as indicated above. You end up with one copy constructor call and one move constructor call in that case. Edit: you changed it. That does seem like a bit of a concern, but I can't think of a reason to create a matrix && and save it. — pat, Mar 06 '14 at 22:32
@pat no, binding an rvalue-reference to a temporary extends the lifetime of the temporary to that of the reference. It doesn't work when the rhs is a _reference_ to a temporary created somewhere else though, which is why it's dangling. — Jay, Mar 06 '14 at 22:34
regarding lvalues/rvalues http://stackoverflow.com/a/20717252/2485710 — user2485710, Mar 06 '14 at 22:37
@jay that's good to know! I'm thinking if that is the worst problem to this implementation, then it is could still be worth it. — pat, Mar 06 '14 at 22:38
@Jay: I think it could be bound to an lvalue-reference-to-const though, as in `const Matrix& x = a + b + c;`. Someone writing this is a bit more plausible. — Igor Tandetnik, Mar 06 '14 at 22:40
`Matrix` can easily be designed so that a move constructor is just two pointer assignments, and a destructor on a moved-from instance is essentially a no-op. It doesn't seem particularly beneficial to try and eliminate those - I imagine any performance hit from them would be completely dwarfed by actual matrix operations. It is difficult to envision a situation where those moves become an actual performance bottleneck. Premature optimization and all that. — Igor Tandetnik, Mar 06 '14 at 22:43
@IgorTandetnik absolutely, in regards to a matrix that is true. I was thinking more in terms of classes whose move constructors and destructors had significant overhead. Also, a workaround for dangling reference problem could be just creating a new temporary from the returned reference with Matrix && x = Matrix(a+b+c+d) — pat, Mar 06 '14 at 22:46
Can you give an example of such a class? It seems to me that, if a move constructor for your class has significant overhead, you are doing it wrong. — Igor Tandetnik, Mar 06 '14 at 22:48
A bit nitpicky: you're asking about replacing a *"prvalue"* return with an "xvalue" return. None of the `operator+`s in your post return an "lvalue": that would look like `Matrix&`. — Jeffrey Yasskin, Mar 24 '14 at 02:40

score 0 · Answer 1 · answered Apr 04 '14 at 14:04

You are pessemizing your code with move constructors here. The matrix addition can be done safely without move constructors at all and the compilers are clever enough to optimize it away.

Here is some test code to prove what I'm saying:

#include <stdint.h>

class Matrix3
{
public:
    float Mtx[3][3];

    inline Matrix3() {};
    inline Matrix3 operator+( const Matrix3& Matrix ) const
    {
        Matrix3 Result;
        for ( size_t i = 0; i != 3; ++i )
        {
            for ( size_t j = 0; j != 3; ++j )
            {
                Result.Mtx[i][j] = Mtx[i][j] + Matrix.Mtx[i][j];
            }
        }
        return Result;
    }

    virtual int GetResult() const
    {
        int Result = 0;

        for ( size_t i = 0; i != 3; ++i )
        {
            for ( size_t j = 0; j != 3; ++j )
            {
                Result += (int)Mtx[i][j];
            }
        }       

        return Result;
    }
};

int main()
{
    Matrix3 M;

    Matrix3 M1;
    Matrix3 M2;
    Matrix3 M3;
    Matrix3 M4;

    M = M1 + M2 + M3 + M4;

    return M.GetResult();
}

I use GCC: (GNU) 4.9.0 20131110 (experimental) as follows: g++ -O3 main.cpp -S

The output assembly looks as below:

_main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    subl    $176, %esp
    call    ___main
    fnstcw  14(%esp)
    fldz
    fadd    %st(0), %st
    fadds   LC0
    fadds   LC0
    fsts    140(%esp)
    movl    140(%esp), %eax
    fsts    144(%esp)
    movl    %eax, 20(%esp)
    movl    144(%esp), %eax
    fsts    148(%esp)
    movl    %eax, 24(%esp)
    fsts    152(%esp)
    movl    148(%esp), %eax
    fsts    156(%esp)
    movl    %eax, 28(%esp)
    fsts    160(%esp)
    movl    152(%esp), %eax
    fsts    164(%esp)
    movl    %eax, 32(%esp)
    fsts    168(%esp)
    movl    156(%esp), %eax
    fstps   172(%esp)
    movl    %eax, 36(%esp)
    movl    160(%esp), %eax
    flds    24(%esp)
    movl    %eax, 40(%esp)
    movl    164(%esp), %eax
    movl    %eax, 44(%esp)
    movl    168(%esp), %eax
    movl    %eax, 48(%esp)
    movl    172(%esp), %eax
    movl    %eax, 52(%esp)
    movzwl  14(%esp), %eax
    movb    $12, %ah
    movw    %ax, 12(%esp)
    fldcw   12(%esp)
    fistpl  8(%esp)
    fldcw   14(%esp)
    movl    8(%esp), %edx
    flds    20(%esp)
    fldcw   12(%esp)
    fistpl  8(%esp)
    fldcw   14(%esp)
    movl    8(%esp), %eax
    flds    28(%esp)
    addl    %eax, %edx
    fldcw   12(%esp)
    fistpl  8(%esp)
    fldcw   14(%esp)
    movl    8(%esp), %eax
    flds    32(%esp)
    addl    %eax, %edx
    fldcw   12(%esp)
    fistpl  8(%esp)
    fldcw   14(%esp)
    movl    8(%esp), %eax
    flds    36(%esp)
    addl    %eax, %edx
    fldcw   12(%esp)
    fistpl  8(%esp)
    fldcw   14(%esp)
    movl    8(%esp), %eax
    flds    40(%esp)
    addl    %eax, %edx
    fldcw   12(%esp)
    fistpl  8(%esp)
    fldcw   14(%esp)
    movl    8(%esp), %eax
    flds    44(%esp)
    addl    %eax, %edx
    fldcw   12(%esp)
    fistpl  8(%esp)
    fldcw   14(%esp)
    movl    8(%esp), %eax
    flds    48(%esp)
    addl    %eax, %edx
    fldcw   12(%esp)
    fistpl  8(%esp)
    fldcw   14(%esp)
    movl    8(%esp), %eax
    flds    52(%esp)
    addl    %eax, %edx
    fldcw   12(%esp)
    fistpl  8(%esp)
    fldcw   14(%esp)
    movl    8(%esp), %eax
    leave
    addl    %edx, %eax
    ret

There is not a single trace of any copy/move constructor or any function call at all. Everything is unrolled into a fast math-grinding stream of instructions.

Seriously, there is no need to write additional handlers for r-values. The compiler makes the perfect code without them.

This is true if the default constructor is trivial, but if, for instance, it zeroes the matrix, then this is slower than what I proposed. Obviously, all of these options depend on the class you are working with (Matrix was just an example from the linked website), but I believe returning XValues over PRValues is always faster if you go the route I posted. — pat, Apr 12 '14 at 18:33

Binary Operators Return Xvalue Instead of PRvalue?

1 Answers1