0

I've stumbled accross a case where increasing the capacity of a vector hurts one of the variables related to its element, and I would like someone to help me understanding what exactly the issue is.

Let's say, I have a class MyObject and a container vector<MyObject> myVector which was already populated with 4 elements. I also have a method:

MyObject* GetFirstActiveElement(vector<MyObject> vec)
{
    for (auto& val : vec)
    {
        if (val->IsActive())
            return &val;
    }
    return nullptr;
}

I have then a piece of code that goes as follows:

MyObject myObject new MyObject();
MyObject* firstActiveElement = GetFirstActiveElement(myVector);
myVector.insert(myVector.begin() + 1, myObject); 

After the last line, if I check firstActiveElement, if it was not nullptr sometimes it is now junk.

After reading some docs, I've found that since myVector had 4 elements, and its default capacity is 4, inserting one more element causes its capacity to increase in a silent manner, whereas this C++ doc says:

If new_cap is greater than capacity(), all iterators, including the past-the-end iterator, and all references to the elements are invalidated. Otherwise, no iterators or references are invalidated.

I actually thought that firstActiveElement is just a pointer, so it should not be invalidated in any case. But apparently, it happens to be an interator or a reference to a vector, is that true? I'm a bit lost here, but I guess the reason is my design of the method GetFirstActiveElement().

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
SBF
  • 345
  • 1
  • 2
  • 14
  • Questions seeking debugging help should generally provide a [mre] of the problem, which includes a function `main` and all `#include` directives. This allows other people to easily test your program, by simply using copy&paste. – Andreas Wenzel Jul 30 '22 at 12:03
  • @AndreasWenzel I guess my question mostly concerns the code design, since I suspect that my `GetFirstActiveElement` design provides some potential issues with the memory. My question is not about getting the same error, rather about me getting one and asking whether my code is dangerously written. – SBF Jul 30 '22 at 12:06
  • If the vector contains objects and the vector resizes then it's possible that every item in the vector moves so the "old" pointer would become invalid. – John3136 Jul 30 '22 at 12:12
  • @John3136 would that have been safer if I stored pointers to objects in the vector, and return them instead? – SBF Jul 30 '22 at 12:13
  • @Ilya it was me, not Andreas. I am insisting on a [mcve]! – πάντα ῥεῖ Jul 30 '22 at 12:19
  • 1
    *"I actually thought that firstActiveElement is just a pointer, so it should not be invalidated"* - that is precisely the reason why it ***is*** invalidated on a vector resize (or re-reserve) outside of already-reserved capacity. – WhozCraig Jul 30 '22 at 12:21
  • @πάνταῥεῖ are you still insisting, even though I've got an answer for a user who found sufficient info in the OP to give an answer? – SBF Jul 30 '22 at 12:25
  • @WhozCraig the doc talked about references and iterators; or am I reading it in a wrong way? I'm somewhat confused – SBF Jul 30 '22 at 12:26
  • 1
    @Ilya: A pointer is both an iterator and a reference, in the broader sense. Iterators and references are abstractions of pointers. – Andreas Wenzel Jul 30 '22 at 12:32
  • @AndreasWenzel thanks, that clears it. So anytime I see a statement about references/iterators, I should think it may also apply to a pointer, correct? – SBF Jul 30 '22 at 12:40
  • @Ilya well, basically your question is a duplicate of this one: https://stackoverflow.com/questions/6438086/iterator-invalidation-rules-for-c-containers – πάντα ῥεῖ Jul 30 '22 at 12:40
  • 2
    @Ilya: Well, I guess it depends on context. I believe the [text of the ISO C++ standard](https://eel.is/c++draft/) does clearly distinguish between pointers, iterators and references, but I am not sure. – Andreas Wenzel Jul 30 '22 at 12:44
  • 1
    @Ilya this isn't math.se, and the c++ community of SO. We're pretty strict and straight forward people here. – πάντα ῥεῖ Jul 30 '22 at 12:55
  • @Ilya: Actually, [§22.3.11.3 ¶7 of the ISO C++20 standard](https://timsong-cpp.github.io/cppwp/n4868/vector.capacity#7) explicitly states that pointers are also invalidated, not just iterators and references. – Andreas Wenzel Jul 30 '22 at 13:01
  • @πάνταῥεῖ ok, what can I say, I guess I should abide by your strictness :D but let me also be straight forward and tell you that I don't find it helpful to go through lenghty answers to a much broader question – SBF Jul 30 '22 at 13:02
  • @AndreasWenzel thanks a lot, so apparently I should rely cppreference.com with a thought in mind that there may still be some caveats? – SBF Jul 30 '22 at 13:03
  • @Ilya: In general, I think `cppreference.com` is very good. I am unaware of any better reference site. However, if you want to know something very exactly, then it is sometimes necessary to look up the text of the ISO standard itself. In this case, I don't blame `cppreference.com` for omitting the mentioning of pointers, because, as I have previously stated, pointers are iterators and references in a broader sense. – Andreas Wenzel Jul 30 '22 at 13:07
  • 1
    @Ilya: The reason why it is generally better to provide a [mre] is that it often happens that the OP asking a question describes a bug and only posts the code which he thinks contains the bug. It then often later turns out that the bug was not caused by the posted code, but rather by the code that the OP was not showing. However, in this case, this does not seem to be an issue. The bug you are describing does indeed seem to be caused by your posted code. – Andreas Wenzel Jul 30 '22 at 13:50
  • @AndreasWenzel that's perfectly logical, and I of course I thought of the very same reasons - by no means I can guarantee that the issue is within the code unless I've checked it. But with memory leaks in particular it's hard for me to guess how much of the code should I provide, given the random nature of the issue. – SBF Jul 30 '22 at 13:53

1 Answers1

5

Any access to the value returned by GetFirstActiveElement is always undefined behaviour, since the vector is passed by value to the function, inside the function you're dealing with copies of the MyObjects stored in the vector inside the calling function; those copies get destroyed when returning.

Even if you pass a reference resizing the vector may result in the addresses of the vector elements changing (or rather different objects being constructed in the new backing storage by moving the old objects.

The following example demonstrates this:

int main() {
    std::vector<int> v;
    v.push_back(1);
    void* p1 = &v[0];
    v.reserve(1000);
    void* p2 = &v[0];

    std::cout << "p1=" << p1 << "\np2=" << p2 << '\n';
}

Possible output:

p1=000001B4B85C5F70
p2=000001B4B85D29B0

If you want to keep addresses of the MyObjects stable, you could use a std::vector<std::unique_ptr<MyObject>> which however means that the vector can only be moved, not copied.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
fabian
  • 80,457
  • 12
  • 86
  • 114
  • Thanks! If I may, why `p1` is of `void*` type and not `int*`? More importantly, will my issue be mitigated if I stored pointers as elemnts of the vector, not the actualy objects? I guess in that case the pointers will be relocated, but not the objects themselves. So even though the addresses of the pointers may change, their values will stay the same, and will point to the same objects before and after the increase of capacity, right? – SBF Jul 30 '22 at 12:27
  • @Ilya any pointer is implicitly convertible to `void*` (or `void const*`, if it's a pointer to const). `&v[0]` indeed does yield a `int*` but if you print and expression of type `void*` under some circumstances it can prevent "special behaviour" of the `<<` operator such as for `char*`. This isn't necessary in this scenario though. As for vectors of pointers: This indeed works, but you loose the objects being destroyed on the vector destruction. For unique pointers you do get the benefits of both with only minor modifications necessary. You must pass the vector as reference in that case though – fabian Jul 30 '22 at 13:07
  • Agree on passing vector as a reference, in my code it's actually a method of a class where `vector` is a member, so this method does not have any arguments; I think I'll go with the vector of pointers then, at least now I see a good reason why one would use that – SBF Jul 30 '22 at 13:12
  • @Ilya If you know about an upper bound of the number of elements, you could fix this by reserving enough capacity for the vector never to need to reallocate the backing storage, but in this case I'd recomment only providing move semantics for `MyObject`, not copy semantics to not accidentally make a copy of the vector instead of moving it... – fabian Jul 30 '22 at 13:25
  • could you pin point me to how can I guarantee there's only move and not copy semantics in MyObjects? someting like define copy constructor explicitly to be 0? – SBF Jul 30 '22 at 13:51
  • 1
    @Ilya You can mark functions including the copy and constructor/assignment operator as deleted, but since the presence of a user defined move constructor/operator marks copy constructor/assignment as deleted automatically and you need move samantics for `std::vector`, so you could do `struct MyObject{ MyObject(MyObject&&) = default; MyObject& operator=(MyObject&&) = default;};` (that is assuming the defaulted version works for you. Deleting the operators would look like this btw: `struct MyObject { MyObject(MyObject const&) = delete; MyObject& operator=(MyObject const&) = delete; };` – fabian Jul 30 '22 at 14:34
  • A better solution than `std::unique_ptr` would be to return an index into the vector instead of a pointer (or iterator). The index would remain valid across pointer resizes and even copying. Erasure of elements would invalidate it though, but then you shouldn't use a vector as erase is horribly slow there. – Goswin von Brederlow Jul 30 '22 at 15:00
  • @GoswinvonBrederlow as you could see in the example, I'm also inserting in the middle of the vector; that most definitely casts indices being not very reliable – SBF Jul 30 '22 at 15:32
  • @Ilya At least the index wouldn't crash or be UB after an insert. It just might point to a different object. But you are right that it won't be very reliable if inserting in the middle is a thing you do. You need a data structure that allows insertion without invalidation like a list or map. – Goswin von Brederlow Jul 31 '22 at 20:52