32

I have found some very weird behaviour (on clang and GCC) in the following situation. I have a vector, nodes, with one element, an instance of class Node. I then call a function on nodes[0] that adds a new Node to the vector. When the new Node is added, the calling object's fields are reset! However, they seem to return to normal again once the function has finished.

I believe this is a minimal reproducible example:

#include <iostream>
#include <vector>

using namespace std;

struct Node;
vector<Node> nodes;

struct Node{
    int X;
    void set(){
        X = 3;
        cout << "Before, X = " << X << endl;
        nodes.push_back(Node());
        cout << "After, X = " << X << endl;
    }
};

int main() {
    nodes = vector<Node>();
    nodes.push_back(Node());

    nodes[0].set();
    cout << "Finally, X = " << nodes[0].X << endl;
}

Which outputs

Before, X = 3
After, X = 0
Finally, X = 3

Though you would expect X to remain unchanged by the process.

Other things I have tried:

  • If I remove the line that adds a Node inside set(), then it outputs X = 3 every time.
  • If I create a new Node and call it on that (Node p = nodes[0]) then the output is 3, 3, 3
  • If I create a reference Node and call it on that (Node &p = nodes[0]) then the output is 3, 0, 0 (perhaps this one is because the reference is lost when the vector resizes?)

Is this undefined behaviour for some reason? Why?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Qq0
  • 955
  • 7
  • 12
  • 4
    See https://en.cppreference.com/w/cpp/container/vector/push_back . If you were to have called `reserve(2)` on the vector before calling `set()` this would be defined behavior. But writing a function like `set` that requires the user to appropriately `reserve` enough size before calling it in order to avoid undefined behavior is bad design, so don't do it. – JohnFilleau Mar 02 '20 at 19:31

2 Answers2

40

Your code has undefined behavior. In

void set(){
    X = 3;
    cout << "Before, X = " << X << endl;
    nodes.push_back(Node());
    cout << "After, X = " << X << endl;
}

The access to X is really this->X and this is a pointer to the member of the vector. When you do nodes.push_back(Node()); you add a new element to the vector and that process reallocates, which invalidates all iterators, pointers and references to elements in the vector. That means

cout << "After, X = " << X << endl;

is using a this that is no longer valid.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • Is calling the `push_back` already undefined behavior (since we are then in a member function with invalidated `this`) or does UB occur the first time we use the `this` pointer? Would it be possible to i.e. `return 42;`? – n314159 Mar 02 '20 at 19:31
  • 3
    @n314159 `nodes` is independent from a `Node` instance so there is no UB in calling `push_back`. The UB is using the invalid pointer afterwards. – NathanOliver Mar 02 '20 at 19:33
  • @n314159 a good way to conceptualize this is to imagine a function `void set(Node* this)`, it is not undefined to pass it an invalid pointer, or to `free()` it in the function. I'm not sure but i imagine that even `((Node*) nullptr)->set()` is defined if you don't use `this` and the method is not virtual. – DutChen18 Mar 03 '20 at 09:59
  • I don't think that `((Node *) nullptr)->set()` is ok, since this dereferences a null pointer (you see that kore clearly when writing it equivalently as `(*((Node *) nullptr)).set();`). – n314159 Mar 03 '20 at 10:17
  • `.push_back()` *potentially* invalidates all pointers, references, and iterators. Was `.capacity() == .size()`? Yes, looks like it. – Deduplicator Mar 03 '20 at 13:04
15
nodes.push_back(Node());

will reallocate the vector, thus changing the address of nodes[0], but this is not updated.
try replacing the set method with this code:

    void set(){
        X = 3;
        cout << "Before, X = " << X << endl;
        cout << "Before, this = " << this << endl;
        cout << "Before, &nodes[0] = " << &nodes[0] << endl;
        nodes.push_back(Node());
        cout << "After, X = " << X << endl;
        cout << "After, this = " << this << endl;
        cout << "After, &nodes[0] = " << &nodes[0] << endl;
    }

note how &nodes[0] is different after calling push_back.

-fsanitize=address will catch this, and even tell you on which line the memory was freed if you also compile with -g.

DutChen18
  • 1,133
  • 1
  • 7
  • 24