3

Let us consider the following c++ code

#include <iostream>
#include <vector>

class A {
  int x, y;
public:
  A(int x, int y) : x(x), y(y){}
  friend std::ostream & operator << (std::ostream & os, const A & a){
    os << a.x << " " << a.y;
    return os;
  }
};

int main(){

  std::vector<A> a;
  std::vector<const A*> b;

  for(int i = 0; i < 5; i++){
    a.push_back(A(i, i + 1));
    b.push_back(&a[i]);
  }

  while(!a.empty()){
    a.pop_back();
  }

  for(auto x : b)
    std::cout << *x << std::endl;

  return 0;
}

Using a debugger I noticed that after the first insertion is done to a the address of a[0] changes. Consequently, when I'm printing in the second for loop I get an unvalid reference to the first entry. Why does this happen?

Thanks for your help!

  • 2
    When the `vector` has to increase its capacity to hold more elements, it's possible (or maybe it always happens?) that new memory is allocated and all elements are copied to the new space. If you want to prevent this, use `a.reserve(5);` before adding elements. – JohnFilleau Mar 20 '20 at 17:06
  • 5
    Look at the section "iterator invalidation" on this page https://en.cppreference.com/w/cpp/container/vector – JohnFilleau Mar 20 '20 at 17:07
  • 2
    If elements in `b` point at elements in `a` and you then empty `a`, what is there for the elements in `b` to point at? – user4581301 Mar 20 '20 at 17:08
  • @John Thanks for your reply! I didn't think of that. Yeah, I'm not sure about the implementation details of vector, but I'm sure that it increases its size after some threshold to have O(1) amortized cost. Perhaps the pointers are no longer valid when it moves those elements! Thanks again – Jose Abel Castellanos Joo Mar 20 '20 at 17:14
  • @user4581301 I was thinking that the problem was there. But it clearly it's not the case since the only problematic case is just the first entry. – Jose Abel Castellanos Joo Mar 20 '20 at 17:15
  • 1
    Not true, @Jose . The only **visible** problematic case is the first. Accessing a pointer to an invalid object invokes [Undefined Behaviour](https://en.cppreference.com/w/cpp/language/ub), and UB may look like it works, visibly not work, or anything and everything in between. UB is probably the single scariest thing in all of programming. – user4581301 Mar 20 '20 at 17:19
  • @user4581301 Ohh, that's interesting! I didn't about that. Thanks for your reply! – Jose Abel Castellanos Joo Mar 20 '20 at 17:24

3 Answers3

5
  for(int i = 0; i < 5; i++){
    a.push_back(A(i, i + 1)); //add a new item to a
    b.push_back(&a[i]); // point at the new item in a
  }

The immediate problem is Iterator invalidation. As a grows, it reallocates its storage for more capacity. This may leave the pointers in b pointing to memory that has been returned to the freestore (probably the heap). Accessing these pointers invokes Undefined Behaviour and anything could happen. There are a few solutions to this, such as reserving space ahead of time to eliminate reallocation or using a container with more forgiving invalidation rules, but whatever you do is rendered moot by the next problem.

  while(!a.empty()){
    a.pop_back(); // remove item from `a`
  }

Since the items in b point to items in a and there are no items in a, all of the pointers in b now reference invalid objects and cannot be accessed without invoking Undefined Behaviour.

All of the items in a referenced by items in b must remain alive as long as the item in b exists or be removed from a and b.

In this trivial case that answer is simple, don't empty a, but that defeats the point of the example. There are many solutions to the general case (just use a, store copies rather than pointers in b, use std::shared_ptr and store shared_ptrs to As in both a and b) but to make useful suggestions we need to know how a and b are being consumed.

user4581301
  • 33,082
  • 7
  • 33
  • 54
2

std::vector is basically a dynamic array. Size of a dynamic array is not known at compile time and keeps changing at runtime. Therefore, whenever you fill elements into it, it has to keep growing. When it can't grow contiguously, the system has to look for a new contiguous block of memory that could hold that many elements. This answers your first question, as the base address of the vector changes.

Consequently, the address of all elements in the vector changes. This is a sufficient reason to cause the error in your second question. Moreover, you empty the contents of the first vector, to which the elements in your second vector point at. Obviously, this would cause an invalid dereferencing inside your second for loop.

Ardent Coder
  • 3,777
  • 9
  • 27
  • 53
  • Yeah, the main problem is when std::vector resizes the dynamic table. I was perplexed that the compiler didn't complain when it was printing stuff inside the second for loop. Thanks for your response! – Jose Abel Castellanos Joo Mar 20 '20 at 17:23
  • 1
    @JoseAbelCastellanosJoo Because the memory may still be accessible but the contents could be anything. This is known as **Undefined Behaviour**. – Ardent Coder Mar 20 '20 at 17:26
2

When you add more elements to a std::vector than it has capacity, it will allocate new storage, move all of its elements to the new, larger, storage, and then finally free its old storage. When this happens, all pointers, references, and iterators to the elements in the vector's old storage become invalid.

To avoid having this happen you can use std::vector::reserve to pre-allocate enough storage for all of the elements you're going to add to the vector. I would advise against doing that though. It's brittle and very easy to screw something up and wander into undefined behavior. If you need to store elements of one vector in another you should prefer storing indices. Another option is to use an address-stable container like std::list instead of std::vector.

Miles Budnek
  • 28,216
  • 2
  • 35
  • 52