0

I'm trying to implement a disjoint Forest Data Structure. In short, it is a data structure of sets with no common elements and can easily perform operations such as combining 2 sets and finding the set of an element. Each set has certain number of elements.

I'm implementing a set as a tree and each element of a set is a Node type. Here's my DisjointForesh.h file:

#ifndef STD_VECTOR
#define STD_VECTOR
#include <vector>
#endif

#ifndef DISJOINTFOREST_DISJOINTFOREST_H
#define DISJOINTFOREST_DISJOINTFOREST_H

struct TreeRoot;

struct Node{
    int rank = 0;
    int id;
    TreeRoot* parentTree;
    Node* parent;
    int value;
};

struct TreeRoot{
    std::vector<Node *> nodes;
    int id;
};

TreeRoot makeSet(int x);
.......

#endif //DISJOINTFOREST_DISJOINTFOREST_H

DisjointForest.cpp:

#ifndef STD_VECTOR
#define STD_VECTOR
#include <vector>
#endif
#include "DisjointForest.h"

TreeRoot makeSet(int x){
    TreeRoot tree;
    Node node;
    node.value = x;
    node.parent = &node;
    tree.nodes.push_back(&node);
    tree.id = node.value;
    node.parentTree = &tree;
    return tree;
}
........

Initialially, I make a Tree, containing one node. The node parent points to itself (initially) and parentTree points to the Tree the node belongs to. Each Tree has a vector of pointers to nodes, determining all the nodes it contains and an id which is the same as the value of the first node in the vector/the value of node with which it is created.

I plan to do many Union operations where a nodes parent will point to other nodes and parentTree will point to other Trees. Similarly, find-set(x) operation should return a tree, where Node x belongs.

Here's my main.cpp:

#include <iostream>
#include "DisjointForest.h"

void printParentAddress(TreeRoot &tree){
    for(Node* nodes: tree.nodes){
        std::cout << nodes->value << "-->" << nodes->parentTree->id << '\n';
    }
}

void printNodesInTree(TreeRoot &tree){
    for(Node* nodes: tree.nodes){
        std::cout << nodes->value << ' ';
    }
}


int main() {
    std::vector<TreeRoot> trees;
    std::vector<Node *> nodes;
    for(int index = 0; index < 8; ++index){
        TreeRoot tree = makeSet(index);
        Node *node = tree.nodes[0];
        trees.push_back(tree);
        nodes.push_back(node);
    }


    std::cout << "Total trees: " << trees.size() << '\n';
    std::cout << "Total nodes: " << nodes.size() << '\n';
//
    std::cout << "Printing all nodes with parent trees:\n";
    for(Node *node: nodes){
        std::cout << node->value << "-->" << node->parentTree->id << '\n';
        std::cout << node << '\n';
}

    return 0;
}

I get the following output:

Total trees: 8
Total nodes: 8
Printing all nodes with parent trees:
-749254164-->0
0x7ffe0cd60cf0
-749254164-->0
0x7ffe0cd60cf0
-749254164-->0
0x7ffe0cd60cf0
-749254164-->0
0x7ffe0cd60cf0
-749254164-->0
0x7ffe0cd60cf0
-749254164-->0
0x7ffe0cd60cf0
-749254164-->0
0x7ffe0cd60cf0
-749254164-->0
0x7ffe0cd60cf0

As you can see:

  1. Each Nodes value is a garbage value
  2. Each Node has the same address
  3. Parent ID is 0 while it should be the same as nodes value

I don't understand why the above three are happening. I was expecting the values from 0-7 and different addresses for each node.I'm relatively new to c++ and therefore, I'm finding it difficult to debug this program.

Could you please help me figure out where I'm going wrong? Any suggestions to optimize my code will also be helpful. Please let me know if any additional information is required.

Mohit Motwani
  • 4,662
  • 3
  • 17
  • 45
  • You're pushing the same `Node* node` in `nodes` vector so that's why you're getting the same addresses and same values – Vladimir Nikitin May 13 '20 at 11:20
  • `makeSet` returns a `TreeRoot` whose data members point to the locally scoped `Node` `node` (amonst other things) which subsequently goes out of scope. That's undefined behaviour. – G.M. May 13 '20 at 11:20
  • @G.M. Do you mean that the node variables in the TreeRoot are cleared of memory immediately after TreeRoot is created? – Mohit Motwani May 13 '20 at 11:25
  • @VladimirNikitin I don't understand. I'm creating a new Node pointer every loop and not the same, yet the they have same addresses. – Mohit Motwani May 13 '20 at 11:26
  • I mean they become dangling pointers. They reference a memory location that no longer represents a valid `Node` instance. – G.M. May 13 '20 at 11:27
  • @MohitMotwani The storage for objects that have ceased to exist is reused, otherwise you would run out of memory in no time. All those `Node`s just happen to be created in the same place. (And accessing them later has undefined behaviour.) – molbdnilo May 13 '20 at 11:29
  • @MohitMotwani try creating some variable on stack before calling makeSet on even iterations of loop and calling makeSet without creating this variable on odd iterations - you should get different addresses – Vladimir Nikitin May 13 '20 at 11:32
  • @ G.M. I see what you mean. Node object is deleted and the Tree nodes vector is pointing to this location where the Node object used to be. Do you have any suggestions how I can fix this? In particular which type of data structure should I use so that nodes aren't deleted and I can hold these nodes in the TreeRoot object? – Mohit Motwani May 13 '20 at 11:33
  • @molbdnilo This makes so much sense. But then how else can I make pointers in new locations that point to different nodes? – Mohit Motwani May 13 '20 at 11:36
  • 2
    Adding a custom include guard for `#include ` doesn't make much sense. Each standard header should already have one. – HolyBlackCat May 13 '20 at 11:41
  • 1
    @MohitMotwani Read about `new` in your favourite [C++ book](https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list). And start with simpler data structures, like a linked list. – molbdnilo May 13 '20 at 11:46

1 Answers1

1
TreeRoot makeSet(int x){
    TreeRoot tree;
    Node node;
    node.value = x;
    node.parent = &node;
    tree.nodes.push_back(&node);
    tree.id = node.value;
    node.parentTree = &tree;
    return tree;
}

node is local to the block of this for function and has automatic storage. At the end of the function, the lifetime of node ends and it is automatically destroyed.

The returned tree points to this destroyed node. Those pointers become invalid as soon as the function returns.

for(Node *node: nodes){
    std::cout << node->value << "-->" << node->parentTree->id << '\n';

Here, you indirect through invalid pointers attempting to access non-existing objects. The behaviour of the program is undefined.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • I see. From the your answer and comments above, I understand that my code has many out of scope errors(dangling pointers etc). Any idea how I can approach fixing my code from here on? I'm really not sure how do I make my objects not be deleted. – Mohit Motwani May 13 '20 at 11:45
  • @MohitMotwani How about using `std::set` or `std::unordered_set`? – eerorika May 13 '20 at 11:46
  • You are right, I can use this. But I believe implementing these would really help me understand c++ better (as that is my primary objective) than using the structures from the standard library. – Mohit Motwani May 13 '20 at 11:53
  • @MohitMotwani I recommend starting with simpler data structures instead. And you'll probably need a book. Guesswork is not the best strategy in learning a language. – eerorika May 13 '20 at 11:58
  • Thank you @eerorika. I'll keep that in mind. Presently referring learncpp.com. – Mohit Motwani May 13 '20 at 12:09