I am trying to implement Disjoint Sets for use in Kruskal's algorithm, but I am having trouble understanding exactly how it should be done and in particular, how to manage the forest of trees. After reading the Wikipedia description of Disjoint Sets and after reading the description in Introduction to Algorithms (Cormen et al) I have come up with the following:
class DisjointSet
{
public:
class Node
{
public:
int data;
int rank;
Node* parent;
Node() : data(0),
rank(0),
parent(this) { } // the constructor does MakeSet
};
Node* find(Node*);
Node* merge(Node*, Node*); // Union
};
DisjointSet::Node* DisjointSet::find(DisjointSet::Node* n)
{
if (n != n->parent) {
n->parent = find(n->parent);
}
return n->parent;
}
DisjointSet::Node* DisjointSet::merge(DisjointSet::Node* x,
DisjointSet::Node* y)
{
x = find(x);
y = find(y);
if (x->rank > y->rank) {
y->parent = x;
} else {
x->parent = y;
if (x->rank == y->rank) {
++(y->rank);
}
}
}
I am pretty sure this is incomplete though and that I am missing something.
Introduction to Algorithms mentions that there should be a forest of trees, but it does not give any explanation for a practical implementation of this forest. I watched CS 61B Lecture 31: Disjoint Sets ( http://www.youtube.com/watch?v=wSPAjGfDl7Q ) and here the lecturer uses only an array to store both the forest and all its trees and values. There is no explicit 'Node' type of class as I have, mentioned. I have also found many other sources (I cannot post more than one link), which also use this technique. I would be happy to do this, except that this relies on the indices of the array for lookup and since I want to store values of type other than int, I need to use something else (std::map comes to mind).
Another issue that I am unsure about is memory allocation because I am using C++. I am storing trees of pointers and my MakeSet operation will be: new DisjointSet::Node; . Now, these Nodes only have pointers to their parents, so I'm not sure how to find the bottom of a tree. How will I be able to traverse my trees to deallocate them all?
I understand the basic concept of this data structure, but I'm just a bit confused about the implementation. Any advice and suggestions would be most welcome, thank you.