0

Now, I know this is a common question, but I haven't been able to really find a straight answer on this. This is really a question about standards. I am working on a project involving the genetic algorithm. But I'm running into a bottleneck when it comes to returning a vector. Is there a "proper" way to do this. Normally I use dynamically allocated arrays, and return a pointer to a newly created array.

obj* func(obj* foo);

That way, everything is efficient and there is no copying of data. Is there an equivalent to doing this with a vector? This vector has objects in it, so returning it by value would be slow I imagine. Is the only solution to pass a "resultant" vector by reference?

void func(vector<obj> &input, vector<obj> &result);

And, on a side note for future reference, is it standard practice to use a vector or other STL container over a dynamically allocated array? Are dynamically allocated arrays only used as a low level tool for designing containers? Are they just a relic of the past?

Dimi
  • 187
  • 1
  • 2
  • 7
  • 1
    How about use std::vector and std::move ? – digit plumber Apr 17 '14 at 21:52
  • 4
    Don't imagine. Benchmark. – Benjamin Lindley Apr 17 '14 at 21:53
  • 1
    Your best case scenario (for performance) is one allocation and then passing/returning that pointer as needed. It can only get worse from there, so while C++11 affords some nice stuff with move semantics it's at best as efficient but can be worse. – qeadz Apr 17 '14 at 21:55
  • Agree with @qeadz. If you want to *know* that the code will perform optimally, allocate the array once and pass it to the function to fill. – japreiss Apr 17 '14 at 21:58
  • Have you looked at this [question](http://stackoverflow.com/questions/12953127/what-are-copy-elision-and-return-value-optimization) yet? – jliv902 Apr 17 '14 at 22:01
  • Are you using C++11? http://stackoverflow.com/questions/6211575/proper-way-move-semantics-to-return-a-stdvector-from-function-calling-in-c – David Zech Apr 17 '14 at 22:05
  • "When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue." – Veritas Apr 17 '14 at 22:11
  • 1
    @jliv902: There is a hidden cost in that API, in that each call to the function *creates* a new vector, you cannot reuse one vector (and the allocation) for different calls. Whether this matters enough to use an uglier interface is a completely different question. – David Rodríguez - dribeas Apr 17 '14 at 22:12
  • 3
    _"Now, I know this is a common question, but I haven't been able to really find a straight answer on this."_ Every time someone starts a question with that it really means "I didn't look hard enough or I didn't understand the answers" **not** that there aren't dozens of straight answers – Jonathan Wakely Apr 17 '14 at 22:25
  • @qeadz, people asking beginner questions should not be encouraged to pass around dynamically allocated objects by raw pointer. Ever. The chances that someone asking this kind of question needs the *absolute* maximum performance *at any cost* is approximately zero anyway. – Jonathan Wakely Apr 17 '14 at 22:38
  • @JonathanWakely you are right that its highly unlikely someone needs the absolute maximum performance at any cost. He should preferably look into reusing memory and not doing the allocation - not to mention the safety in not passing pointers around. However I strongly disagree with your sentiments that the information shouldn't be present. Nothing infuriated me more when I was learning than only having answers of "you should rather be doing this". I think the extra tid-bits are what these comments are for. "This is how you should do it" is for the actual answers. – qeadz Apr 17 '14 at 22:47
  • 1
    @qeadz, a comment doesn't have room to explain the pros and cons, so all you've given is something that's usually a bad idea, with no context. – Jonathan Wakely Apr 17 '14 at 23:17
  • @qeadz, you could at least suggest using `unique_ptr` which is zero overhead and makes code safer via "nice stuff with move semantics" – Jonathan Wakely Apr 17 '14 at 23:26
  • 2
    @qeadz: Actually, comments are for requesting clarification and for trolling people. – Lightness Races in Orbit Apr 17 '14 at 23:28
  • 1
    @qeadz: I completely agree with Jonathan here. Comments are the worst place to suggest unsafe constructs as there is no space to explain the cons. Now someone reads the comment and decides that it it the best thing to do… well, they avoided 2 pointer copies (that is the additional cost of returning a `std::vector` over a raw pointer) which depending on the calling convention might be in registers. Compare that with the cost of the memory allocation and filling the data, what is the real cost? *Nothing*. What is the benefit? Tons of fun managing memory. – David Rodríguez - dribeas Apr 18 '14 at 02:46
  • @JonathanWakely You have four comments here and none of them addressed the question itself. Now, regardless of whether or not what you say is true, you could have at least have addressed the issue a little more constructively and pointed me in the right direction. Also, understand that I learned to code in an academic setting, so it was more algorithm centric. They didn't teach good coding practice. We had to do things the hard (and not always efficient) way. Recreate already made structures, full memory management and all that jazz. Now I want to learn how to do it the standardized way. – Dimi Apr 18 '14 at 05:20
  • @Dimi, I voted for dribeas's answer, as it says what I would say. I can't downvote comments, so wanted to point out that unmanaged dynamic allocation is usually not a good idea _even if_ it is fast (which is not actually assured, as dribeas commented above) – Jonathan Wakely Apr 18 '14 at 11:22
  • @JonathanWakely Appreciate the input! – Dimi Apr 19 '14 at 03:18

4 Answers4

6
vector<obj> func(const vector<obj>& input);

All compilers implement RVO either by default or when you turn optimizations on, if you don't turn optimizations on you don't care about performance, so that would not be an issue anyway.

If your compiler is C++11 conforming, then in the worst case instead of copying 1 pointer you will pay for 3 pointers when move semantics kick in, which compared with the cost of allocating the memory is really no cost.

Build a simple meaningfull interface, then measure the performance and bottlenecks and optimize from there.

Depending on your use pattern, if you can reuse the output container, you might want to transform the above into:

void func(vector<obj>& output, const vector<obj>& input);

This will have the same performance when creating a new vector, but if you call this function in a loop, you may be able to avoid a few allocations:

std::vector<obj> output;
for ( ... ) {
   output.clear();
   func(output, input);

// compare with
for ( ... ) {
   std::vector<obj> output = func(input);

In the first block of code, if all of the vectors are of the same size, the clear() will remove the elements but leave the buffer intact, so that the next call to func need not allocate. In the second case it needs to allocate inside func and will deallocate at the end of the scope.

This is a slightly uglier interface to use, so I would opt for the first one (which semantically is cleaner), and use the second only if the first proves to have an issue with multiple allocations.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
2

Is the only solution to pass a "resultant" vector by reference?

No. You can also pass an iterator, like algorithms do:

void func( InputIt first, InputIt last, OutputIt res);

you can then make this a template:

template< class InputIt, class OutputIt >
void func( InputIt first, InputIt last, OutputIt res);
4pie0
  • 29,204
  • 9
  • 82
  • 118
2

If you want to be reasonably sure that your function doesn't copy the vector, then:

 void func(const vector<obj> &input, vector<obj> &result);

If you are OK with relying on compiler "return value optimisation" (RTVO) (most good compilers do this):

 vector<obj> func(const vector<obj> &input); 

Return value optimisation is essentially where the compiler knows that we're returning a copy of a vector, so it removes the "extra copy" that would normally be needed in returning the content of, in this case, vector<obj> from func. A C++11 compliant compiler should use the "move constructor" if it is not able to do RTVO, which is also small cost in the whole scheme of things.

If you are happy to dynamically allocate the vector itself, you can return a pointer to it, but this is a poor solution:

vector<obj>* func(const vector<obj> &input);

This solution relies on allocating vector<obj>, which means that something somewhere has to delete the result of the call to func. Maintenance of such code, if func is called many times, gets rather nasty (especially if func gets called lots of times, which may well be the case in a genetics analysis situation).

It is usually hard to achieve a return of a reference, so you are probably doing something wrong if you do - DO NOT DO THIS (unless you know what you are doing and why):

vector<obj>& func(const vector<obj> &input); 

Of course, it REALLY depends on what you are doing with the vectors, and there may be other solutions that are even better, if you give a more concrete example of what you are doing inside the function.

By the way:

obj *func(obj *input)

is a maintenance nightmare, because if obj* is allocated in func, it is now for the caller to maintain this structure.

Finally, ALWAYS when you deal with performance, it's important to benchmark with YOUR compiler, on YOUR machine(s). Different compilers and different systems behave differently - guessing based on looking at the source-code is a very poor solution for understanding what code will be generated and how fast it is.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • You don't even have to rely on RTVO. Worst-case you're going to trigger the move-constructor. – PeterT Apr 17 '14 at 22:11
  • Agreed about the move constructor "worst case" not being a big deal (in C++11 anyway). But if you **are** going to return a pointer, you can return a unique_ptr, or a boost/std::shared_ptr instead of a bare pointer. – M2tM Apr 17 '14 at 22:22
  • Yes, but allocating a pointer to hold a `vector` is a bad idea in the first place, even if it's stored in a wrapper that ensures it's deleted at the end. So not returning a pointer in the first place is still a better solution. – Mats Petersson Apr 17 '14 at 22:25
0

If you can, always use a reference (&) instead of using the class itself. This even refers to smart pointers (std::shared_ptr), because you will avoid an extra copy even of the pointer itself. And just to make sure that your returned pointer does not get overwritten, pre-pend const to it, so:

void Function(const std::shared_ptr<SomeClass> &someObject);
santahopar
  • 2,933
  • 2
  • 29
  • 50