4

I have a function copying one vector to another (among many other things). A simplified version of the code is below.

void Fun(std::vector<double> &in, std::vector<double> &out) {
    out = in;
}

I care about maximizing efficiency because the function will be run many times. So, I would like to avoid reallocating memory as much as possible. The vector 'in' may already have a significant amount of memory reserved to it. So, if I made a manual implementation, I am sure that I could accomplish this. For example:

void Fun(std::vector<double> &in, std::vector<double> &out) {
    out.resize(in.size());//note - if the capacity of out is greater than in.size(), this will involve no memory release or allocation
    for (unsigned int i = 0;i<in.size();i++) {
        out[i] = in[i];
    }
}

If in.size() is less than the existing capacity of out, then this latter implementation will not release or assign any new heap memory.

The question is - would my original implementation do the same? Ie, would std::vector know to automatically do this in a memory efficient way if I simply did "out = in;"?

In particular, I am concerned that perhaps the "out = in;" line might do something like release all the heap memory currently allocated to out, and then reallocate it. Something effectively equivalent to this:

void Fun(std::vector<double> &in, std::vector<double> &out) {
    out.clear();
    out.shrink_to_fit();//releasing memory from heap
    out.resize(in.size());//reallocating memory from heap
    for (unsigned int i = 0;i<in.size();i++) {
        out[i] = in[i];
    }
}
Mark Mc
  • 99
  • 8
  • 1
    Regarding efficiency, your second method potentially double-initializes the vector. You'd be better off doing `out.assign(in.begin(), in.end());` As for the general question, see notes for [std::vector::operator=](https://en.cppreference.com/w/cpp/container/vector/operator%3D) which mentions _"the memory owned by *this may be reused when possible"_ – paddy Aug 08 '23 at 00:27
  • 1
    I removed the stl tag since this question appears to be about the C++ standard library's `std::vector` and not the original SGI Standard Template Library. See also [What's the difference between "STL" and "C++ Standard Library"?](https://stackoverflow.com/q/5205491/11082165) – Brian61354270 Aug 08 '23 at 00:27
  • 2
    I think when you're talking about capacity you have `in` and `out` reversed. – Mark Ransom Aug 08 '23 at 00:27
  • @SamVarshavchik That's probably enough for an answer – Brian61354270 Aug 08 '23 at 00:31
  • 1
    The specification only demands that the assignment operator have "linear complexity", which would be the case with or without the optimal behavior you describe. So while it's highly likely that all modern/widely used implementations do things as efficiently as possible, a less-efficient (but still linear-time) implementation would still be considered compliant with the C++ specification. Therefore this question can only be strictly answered if you a specify a particular implementation of `std::vector` that you are asking about. (in practice, though, I wouldn't worry about it) – Jeremy Friesner Aug 08 '23 at 01:25
  • I wouldn't bother with the second approach. The requirement to avoid any reallocation is that `out.capacity() >= in.size()` and, if that is not true, only a single reallocation and copy of elements would be required. It would be a pretty terrible quality (or deliberately implemented in an obtuse manner) implementation of `operator=()` that did multiple reallocations in that case. If you are calling `Fun()` multiple times, you can reduce reallocations by anticipating the maximum value of `in.size()` in advance, and setting `out.capacity()` to be (at least) that. – Peter Aug 08 '23 at 01:31
  • Good points about potentially double-initializing the vector. That wasn't really my point in the question, but agreed that would be an issue with the second implementation. – Mark Mc Aug 08 '23 at 02:20
  • And yes @MarkRansom there was a typo in the comment, now fixed. – Mark Mc Aug 08 '23 at 02:25
  • @Peter Yes, I may be overly paranoid in asking the question. I suppose it does seem likely that an STL implementation of the copy operator is going to avoid inefficiency. – Mark Mc Aug 08 '23 at 02:29
  • @paddy Thank you. You provided a link to the documentation which directly answers my question that "memory will be reused if possible". I believe this is a good answer to the question, I appreciate it! – Mark Mc Aug 08 '23 at 02:35

2 Answers2

5

The implementation of std::vector::operator= in your C++ library came into existence a very long time ago. It is a near certainty that its authors knew what they were doing and, in combination with a modern C++ compiler, it will generate optimal code to minimize the amount of work and dynamic allocation. The authors of operator= are very smart. They're smarter than me. They're smarter than you.

It's fairly likely that the shown attempt to optimize this operation will, at best, come out even. At its worst, when out is smaller that in.size() it'll end up spinning its wheels with a bunch of pointless zero-initialization in resize(), unless the compiler is smart enough to optimize it out.

If there's specific knowledge that in a particular use case the stock out=in; results in measurable suboptimal performance, then some kind of a micro-optimization might be prudent; but a mere "care about maximizing efficiency" should have something specific to evaluate.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • 3
    Is this really an answer to the question? – xaxxon Aug 08 '23 at 01:13
  • 1
    I don't really agree with the perspective "it's probably gonna be ok". C++ can be squirrely with esoteric rules, and sometimes, code profiling can be yield misleading results if you don't understand the 'why or if you aren't careful to set up a test that can be misled by a compiler optimization'. So, sometimes, it's nice to ask a confirming question and be sure. – Mark Mc Aug 08 '23 at 02:23
  • 1
    Not only are the authors smart, but they have incentive to pull out all the stops when making this efficient. They know it's going to happen a lot, so they'll take the time to try tricks you wouldn't even think of. – Mark Ransom Aug 08 '23 at 04:30
  • 2
    it is true that, sometimes, a C++ compiler gets some innocent code that's perfectly fine but ends up compiling it to something horribly inefficient. But that's a rare, very rare exception. It is not the rule. So, if one was to always start implementing every kind of a micro-optimization that one could possibly think of, on the presumption that the end result is an improvement on the C++ compiler' or library's result, occasionally they'll be right, but most of the time their efforts will go to waste. And, when something goes sideways, it's pretty obvious, and then it'll be time to tweak it. – Sam Varshavchik Aug 08 '23 at 11:29
3

The vector 'in' may already have a significant amount of memory reserved to it.

[...] would my original implementation do the same?

I don't know of any implementation that reserves an excess of memory just because the vector being copied from has a lot of .capacity(). Also, your second snippet doesn't reserve more memory than needed to store all elements in in - and it does it by first default constructing the extra elements and then assigning new values to them. For doubles, it probably doesn't matter much, but for non-trivial types, it can make a noticeable difference. The original out = in; will never be slower than the code in your second snippet.

If you want to reserve as much memory as in has reserved, you can do it manually with .reserve():

void Fun(const std::vector<double>& in, std::vector<double>& out) {
    out.clear();
    out.reserve(in.capacity());
    out = in;
}

Regarding the additions to your question:

In particular, I am concerned that perhaps the out = in; line might do something like release all the heap memory currently allocated to out, and then reallocate it. Something effectively equivalent to this:

void Fun(std::vector<double> &in, std::vector<double> &out) {
    out.clear();
    out.shrink_to_fit();//releasing memory from heap
    out.resize(in.size());//reallocating memory from heap
    for (unsigned int i = 0;i<in.size();i++) {
       out[i] = in[i];
    }
}

No, that is highly unlikely. From cppreferece:

[...] Otherwise, the memory owned by *this may be reused when possible. In any case, the elements originally belonging to *this may be either destroyed or replaced by element-wise copy-assignment.

A sane implementation would do what it can to reuse the memory it already has allocated. The implementation will likely use different approaches for different types to make the copy assignment as effective as possible. For trivially copyable types (like double), it may out.resize(in.size()) and then std::copy. For more complex types, it may out.reserve(in.size());, copy assign the existing elements in out and then copy construct the rest (if out.size() < in.size()) - or it may out.clear(); out.reserve(in.size()); and then copy construct them all. It'll depend on the type in the container.

If out has more capacity than in, it will (almost certainly) not shrink_to_fit (which in theory could cause an unnecessary reallocation) but instead keep the over allocation for later use as demonstanted in this example where all major implementations keep the capacity even after copy assignment of a much smaller vector.

Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • I added a bit to the question to clarify, I think it may have been unclear. – Mark Mc Aug 08 '23 at 02:24
  • 1
    I think doing `reserve` before `out=in` will be counter-productive. If a reallocation occurs all the existing data will be copied to the new memory, only to be overwritten a moment later. – Mark Ransom Aug 08 '23 at 04:26
  • @MarkRansom Doing it _after_ will be even worse assuming `out.size() < in.size()` when the call to the function is made. I added a `.clear()` to avoid unnecessary copying. – Ted Lyngmo Aug 08 '23 at 07:29
  • @MarkMc I added a note to address your updated question. – Ted Lyngmo Aug 08 '23 at 11:31
  • 1
    Thank you for the helpful answer! This quote from cppreference explicitly states that they will resuse memory if possible, and so I think this confirms the answer to the question. – Mark Mc Aug 09 '23 at 15:24