0

I encountered a very strange bug while trying to implement persistent segment trees (the context isn't really relevant). If is assign a variable like this: nodes[id].lc = init(l,mid) Sometimes nodes[id].lc is -1, even though init() never returns -1. If however, I simply assign it like:

int id_l = init(l,mid);
nodes[id].lc = id_l;

The bug goes away. What is happening? Here is the code for greater context:

Here is the code that does not work:

#include <bits/stdc++.h>
using namespace std;

struct Node {
    int lc,rc,val;
    Node() : lc(-1), rc(-1),val(0) {

    }
};
vector<Node> nodes;
int n,q;
vector<int> a;

///creates a new node for this interval and returns the id of it
int init(int l, int r) {
    nodes.push_back(Node{});
    int id = (int)nodes.size() - 1;
    if ( l == r) {
        nodes[id].val = a[l];
    } else {
        int mid = (l + r) / 2;
        nodes[id].lc = init(l,mid);
        nodes[id].rc = init(mid + 1, r);

        cout << "nodes[id].lc = " << nodes[id].lc << "\n";
        cout << "nodes[id].rc = " << nodes[id].rc << "\n";
        nodes[id].val = nodes[nodes[id].lc].val + nodes[nodes[id].rc].val;
    }

    cout << "id = " << id << "\n";
    return id;
}  
    
int main() {
    cin >> n >> q;
    a.resize(n + 1);
    for (int i = 1; i <= n; i++) {
        cin >> a[i];
    }

    vector<int> root; 
    root.push_back(init(1 , n));

    return 0;
}

The output for input 6 0 1 2 3 4 5 6 is

6 0 1 2 3 4 5 6
id = 3
id = 4
nodes[id].lc = 3
nodes[id].rc = -1
id = 2
id = 5
nodes[id].lc = -1
nodes[id].rc = 5
id = 1
id = 8
id = 9
nodes[id].lc = -1
nodes[id].rc = 9
id = 7
id = 10
nodes[id].lc = -1
nodes[id].rc = 10
id = 6
nodes[id].lc = -1
nodes[id].rc = -1
id = 0

The problem is that sometimes nodes[id].lc is -1, even though id is never -1 when it is returned from init

However, all we have to do to make the code work is change the init function to this:

int init(int l, int r) {
    nodes.push_back(Node{});
    int id = (int)nodes.size() - 1;
    if ( l == r) {
        nodes[id].val = a[l];
    } else {
        int mid = (l + r) / 2;
        int id_l = init(l,mid);
        int id_r = init(mid + 1, r);
        nodes[id].lc = id_l;
        nodes[id].rc = id_r;

        cout << "nodes[id].lc = " << nodes[id].lc << "\n";
        cout << "nodes[id].rc = " << nodes[id].rc << "\n";
        nodes[id].val = nodes[nodes[id].lc].val + nodes[nodes[id].rc].val;
    }

    cout << "id = " << id << "\n";
    return id;
}  

All we did was introduce the variables id_l, and id_r, and now the output is: (same input, 6 0 1 2 3 4 5 6)

id = 3
id = 4
nodes[id].lc = 3
nodes[id].rc = 4
id = 2
id = 5
nodes[id].lc = 2
nodes[id].rc = 5
id = 1
id = 8
id = 9
nodes[id].lc = 8
nodes[id].rc = 9
id = 7
id = 10
nodes[id].lc = 7
nodes[id].rc = 10
id = 6
nodes[id].lc = 1
nodes[id].rc = 6
id = 0

So everything works fine, there are no more -1s.

What is happening here?

Jeff
  • 97
  • 4
  • 2
    [Why should I not #include ?](https://stackoverflow.com/questions/31816095/why-should-i-not-include-bits-stdc-h) [Why is "using namespace std;" considered bad practice?](https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice) And don't use global variables. Use classes, or pass as arguments. Don't use single or two-letter names, use semantically relevant and descriptive names. And don't use one-based indexing, that will only confuse everyone. – Some programmer dude Mar 30 '23 at 11:23
  • 3
    The issue is that `init` introduces side-effects to `nodes`, so using the temporary variables will _not_ be equivalent to direct assignment – Den-Jason Mar 30 '23 at 11:25
  • 2
    Immediately dumping a new node into the vector, and then recursively calling the function sounds like a bad idea. Learning about `emplace_back()` can alleviate that. – sweenish Mar 30 '23 at 11:27
  • 2
    [What is a debugger and how can it help me diagnose problems?](https://stackoverflow.com/q/25385173/4641116) – Eljay Mar 30 '23 at 11:41
  • @sweenish Could you please elaborate on why it's a bad idea? My understanding is that emplace_back() saves execution time by not creating a new unnecessary object instance, but I don't see how that relates to me recursively calling the function. – Jeff Mar 30 '23 at 13:09
  • Every recursive call creates a new node. Is that actually your desired behavior? – sweenish Mar 30 '23 at 13:21
  • @sweenish Yes, in this case the function is supposed to create an entire new segment tree every time it is called. I understand why that might look like a mistake at a quick glance, though. – Jeff Mar 30 '23 at 14:17

1 Answers1

1

A statement like nodes[id].lc = init(...) boils down to three steps:

  1. Calculate the address of nodes[id] and from there get the address of the lc field
  2. Invoke init
  3. Assign the returned value to the address.

Step 2 involves calling nodes.push_back, which will occasionally trigger a resizing of the vector. As noted in the documentation, that invalidates existing pointers into the storage of the vector. The result of the assignment is thus that you write to the old lc field, while the new one was copied when it was still -1 as part of the push_back call.

Botje
  • 26,269
  • 3
  • 31
  • 41