Efficiency of Operator overloading regarding returned Object

Question

I'm trying to write a C++ Class for managing data in a specific way. More Specific: to mimic mathmatical Matrix behavior like matrix-multiplication and stuff like that with as little overhead as possible. Intuitively i would like to use overloaded operators for better readability, but i always wonder about the additional allocation for returning the result-Matrix. Especially regarding performance if you think about a really large Matrix.

So far, i got everything set up to allocate and correctly access the stored Data by providing dimension coordinates. The objects are Heap allocated as Pointers to do some fancy stuff with transforming the Matrix. Adding looks like this so far:

Matrix* Add(Matrix* A, Matrix* B, Matrix* result)
{
[...]  //Math here

return result;
}

So when calculating, the result is directly written into the result object since its acessed via Pointer.

But when using operator overloading, my confusion starts to grow. I would implement something like this (simplified, minimal problem):

Matrix* operator+(Matrix& other)
{
Matrix* result = new Matrix;
[...] //Math here
return result;
}

Imagine we live in a perfect world and leackage is magically solved, there is still the problem, that i dont see a way around allocating memory for the result internally. If i already have the object preallocated or i recycle one from previous calculations, isn't allocating the result memory waste of computational power? Is there a more efficient way around this or is maybe somehow internally optimized in a way i dont know of?

`Matrix* Add(Matrix* A, Matrix* B, Matrix* result)` -- Work with Matrix *objects* and *references*, not Matrix pointers. Internally, the `Matrix` uses pointers, but to the client, it should be an object. What you have written so far is not the canonical way to develop a Matrix class. What a C++ programmer would expect is: `Matrix Add(const Matrix& A, const Matrix& B);` — PaulMcKenzie, Apr 10 '23 at 20:43
Implement `operator +=`. Then, implement `operator+` using `operator +=`. Where performance is concerned have the code use `+=`, and use `+` for convenience, in non-critical situations. — Sam Varshavchik, Apr 10 '23 at 20:46
See also: https://stackoverflow.com/questions/4421706/what-are-the-basic-rules-and-idioms-for-operator-overloading — Paul Sanders, Apr 10 '23 at 20:47
*but i always wonder about the additional allocation for returning the result-Matrix. Especially regarding performance if you think about a really large Matrix.* -- Don't fall into the trap of trying to be too cute in using pointers. Many times it makes the code *slower*, not faster. The reason is that the compiler's optimizer will give up trying to optimize code that has pointers all over the place, all due to pointer aliasing. If you had written a class that returned references/objects, let the compiler apply NRVO instead of trying to outsmart the optimizer by using pointers. — PaulMcKenzie, Apr 10 '23 at 20:50
The API of your class would generate error prone code, very tricky to read and write. Don't make assumptioms of "overheads", I'd stick with *value semantics*. — MatG, Apr 10 '23 at 20:52
suggested reading https://stackoverflow.com/questions/12953127/what-are-copy-elision-and-return-value-optimization — 463035818_is_not_an_ai, Apr 10 '23 at 20:55
the result of adding two matrices is a third matrix, no matter how many pointers you add — 463035818_is_not_an_ai, Apr 10 '23 at 20:56
*So when calculating, the result is directly written into the result object since its acessed via Pointer.* -- `Matrix Add(const Matrix& A, const Matrix& B) { Matrix result; ... return result; }` -- You will be surprised that this does the equivalent of what you are asking for, all without the client having to "preallocate" anything, and all through the magic of returned value optimization invoked by the compiler. It uses no pointers, and looks "natural" to the caller of this function. — PaulMcKenzie, Apr 10 '23 at 21:03
There are libraries (e.g. eigen) which use various template machinery to remove temporaries and do lazy evaluation (so they defer performing calculations as long as possible, and can then optimise how results of a long series of calculations are performed). The techniques don't tend to involve cute pointer trickery. — Peter, Apr 10 '23 at 23:58

463035818_is_not_an_ai · Accepted Answer · 2023-04-11T07:20:47.270

To large extend, the problem you are worried about has been solved already. Consider this dummy matrix and example of calling operator+:

#include <iostream>

struct matrix {
    matrix() = default;
    matrix(const matrix&) { std::cout << "copy!!!\n";}
    matrix operator+(matrix& other) {
        matrix result;
        return result;
    }
};


int main() {
    std::cout << "how many copies?\n";        
    matrix a,b;
    matrix c = a + b;       
};

I left out the actual matrix elements and the addition, so we can focus on return result;. You are worried about the copy that is made. Due to copy elision the output of the code is:

how many copies?

No copy is made. Copy elision underwent some substantial changes (from non-mandatory to mandatory in some cases), and for details i refer you to What are copy elision and return value optimization? and https://en.cppreference.com/w/cpp/language/copy_elision.

However, to get good efficiency also with more complex expressions, eg matrix a = (a + b) * c + (d + e) * f;, you should consider expression templates (eg https://en.wikipedia.org/wiki/Expression_templates). Its a technique to defer actual computation until the result is actually needed (eg in the example no temporary as result of a+b is required mathematically, but by C++, though C++ allows to return arbitrary custom type from operator+, it needs not be readily computed matrix).

score 0 · Answer 2 · answered Apr 10 '23 at 20:53

You could use shared_ptr instead of raw pointers. The big advantage is that it implements the magic thing you are looking for, for avoiding leakages.

This being said, youd'd better manage a Matrix object that defers the memory management to its internals. The big advantage is that you could deal with different kinds of matrices (e.g sparse matrix, normal matrix, etc). Another advantage is that you leave the compiler to manage the optimization using move semantics and other stuff, which is not as straightforward with pointers.

Efficiency of Operator overloading regarding returned Object

2 Answers2