Returning Large Objects by Value (move semantic) or by pointer?

Question

I have read several articles and answers in SO (in particular this), but they do not provide the full answer to my question. They tend to focus on special cases where the move semantic is as fast as copying a pointer, but that is not always the case.

For example consider this class:

struct Big {
 map<string, unsigned> m;
 vector<unsigned> v;
 set<string> s;
};

And this function:

Big foo();

If foo returns by value and the copy cannot be optimized via RVO, the compiler will apply the move semantic, which implies 3 moves, one for each class member. If the class members were more than 3, then I would have even more operations. If foo returned the Big object by pointer (smart pointer maybe) it would always be 1 operation.

To make things even more interesting, Big objects have a non local life span: they are kept in some data structures for the duration of the application. So you might expect the Big objects to be moved around multiple times during their life and the cost of 3 operations (move semantic) vs 1 operation (pointer) keeps burdening the performance long after the objects were returned by foo.

Given that background information, here are my questions:

1 - First of all I would like to be sure about my understanding of the move semantic performance: is it true that in the example above moving Big object is slower than copying pointers?

2 - Assuming the move semantic is indeed slower, would I accept to return Big objects by pointer or are there better way to achieve both speed and nice API (I consider returning by value a better API)?

[EDIT]

Bottom line: I like to return by value, because if I introduce one single pointer in the API then they spread everywhere. So I would like to avoid them. However I want to be sure about the performance impact. C++ is all about speed and I cannot accept blindly the move semantic without understanding the performance hit.

The problem with a pointer is you have to do a dynamic allocation. That allocation can take a lot more time then it does to copy around a few pointers. Unless you have profiler data to back it up that it is affecting performance, I would stick with return by value for movable types. — NathanOliver, Mar 31 '22 at 15:12
If you want to know about performance, benchmark it. Your description of the objects make me wonder why you would want to move them at all? You're too focused on this particular optimization that it seems you've forgotten to consider the design of your program. — sweenish, Mar 31 '22 at 15:13
It's not 1 operation (pointer). It's 1 operation (pointer) *every time you use the object* versus 3 operations (move) *only when the object moves* — user253751, Mar 31 '22 at 16:18

eerorika · Answer 1 · 2022-03-31T15:35:15.630

they are kept in some data structures for the duration of the application. So you might expect the Big objects to be moved around multiple times during their life

I don't agree with this conclusion. Elements of most data structures tend to be quite stable in memory. Exception are unreserved std::vector and std::string, and other structures based on vector such as flat maps.

If foo returns by value and the copy cannot be optimized via RVO

So, implement foo in a way that can be optimised via RVO. Preferably in such way that a non-move is guaranteed in C++17. This is fast, and a convenient API, so is what you should prefer.

1 - First of all I would like to be sure about my understanding of the move semantic performance: is it true that in the example above moving Big object is slower than copying pointers?

It is true. Moving Big is relatively slower than copying a pointer. They are both rather light operations in absolute terms through (depending on context).

When you think about returning a pointer to a newly created object, you must also think about the lifetime of the object and where it is stored. If you're thinking of allocating it dynamically, and returning a pointer to the dynamic object, then you must consider that the dynamic allocation may be much more expensive than the few moves of the member objects. And furthermore, all of this may be insignificant in relation to all of the allocations that the std::map and other containers will do, so none of this deliberation may end up mattering in the end.

In conclusion: If you want to know what is faster, then measure. If one implementation measures significantly faster, then that implementation is probably the one that is faster (depending on how good you are at measuring).

Returning Large Objects by Value (move semantic) or by pointer?

1 Answers1