Implementing Disjoint Sets (Union Find) in C++

Question

I am trying to implement Disjoint Sets for use in Kruskal's algorithm, but I am having trouble understanding exactly how it should be done and in particular, how to manage the forest of trees. After reading the Wikipedia description of Disjoint Sets and after reading the description in Introduction to Algorithms (Cormen et al) I have come up with the following:

    class DisjointSet
    {
    public:
        class   Node 
        {
        public:
            int         data;
            int         rank;

            Node*       parent;

            Node() : data(0), 
                     rank(0),
                     parent(this) { } // the constructor does MakeSet
        };

        Node*   find(Node*);
        Node*   merge(Node*, Node*); // Union
    };


    DisjointSet::Node*   DisjointSet::find(DisjointSet::Node* n)
    {
        if (n != n->parent) {
            n->parent = find(n->parent);
        }
        return n->parent;
    }

    DisjointSet::Node* DisjointSet::merge(DisjointSet::Node* x,
                                          DisjointSet::Node* y)
    {
        x = find(x);
        y = find(y);

        if (x->rank > y->rank) {
            y->parent = x;
        } else {
            x->parent = y;
            if (x->rank == y->rank) {
                ++(y->rank);
            }
        }
    }

I am pretty sure this is incomplete though and that I am missing something.

Introduction to Algorithms mentions that there should be a forest of trees, but it does not give any explanation for a practical implementation of this forest. I watched CS 61B Lecture 31: Disjoint Sets ( http://www.youtube.com/watch?v=wSPAjGfDl7Q ) and here the lecturer uses only an array to store both the forest and all its trees and values. There is no explicit 'Node' type of class as I have, mentioned. I have also found many other sources (I cannot post more than one link), which also use this technique. I would be happy to do this, except that this relies on the indices of the array for lookup and since I want to store values of type other than int, I need to use something else (std::map comes to mind).

Another issue that I am unsure about is memory allocation because I am using C++. I am storing trees of pointers and my MakeSet operation will be: new DisjointSet::Node; . Now, these Nodes only have pointers to their parents, so I'm not sure how to find the bottom of a tree. How will I be able to traverse my trees to deallocate them all?

I understand the basic concept of this data structure, but I'm just a bit confused about the implementation. Any advice and suggestions would be most welcome, thank you.

Check this and see what you are missing http://www.emilstefanov.net/Projects/DisjointSets.aspx — DumbCoder, Dec 21 '10 at 11:46
The example uses static objects and if you want to use dynamic memory management use new and delete (http://www.cplusplus.com/doc/tutorial/dynamic) — DumbCoder, Dec 21 '10 at 11:49

score 3 · Answer 1 · answered Oct 09 '13 at 16:38

3

Boost has a disjoint set implementation:

http://www.boost.org/doc/libs/1_54_0/libs/disjoint_sets/disjoint_sets.html

answered Oct 09 '13 at 16:38

THK

486
3
11

score 3 · Answer 2 · answered Oct 09 '13 at 17:15

your implementation looks good (except in the function merge you should either declare returning void or put a return there, I prefer to return void). The thing is you need to track the Nodes*. You can do that by having a vector<DisjointSet::Node*> on your DisjointSet class or having this vector somewhere else and declaring the methods of DisjointSet as static.

Here is an example of run (note that I changed merge to return void and didn't change the methods of DisjointSet to be static:

int main()
{
    vector<DisjointSet::Node*> v(10);
    DisjointSet ds;

    for (int i = 0; i < 10; ++i) {
        v[i] = new DisjointSet::Node();
        v[i]->data = i;
    }

    int x, y;

    while (cin >> x >> y) {
        ds.merge(v[x], v[y]);
    }

    for (int i = 0; i < 10; ++i) {
        cout << v[i]->data << ' ' << v[i]->parent->data << endl;
    }

    return 0;
}

With this input:

It prints the expected:

Your forest is the composition of the trees:

   7    1       5    6   9
 /    / | \              |
0    2  3  4             8

So you algorithm is good, have the best complexity for Union-find as far as I know and you keep track of your Nodes on your vector. So you could simply deallocate:

for (int i = 0; i < int(v.size()); ++i) {
    delete v[i];
}

score 3 · Answer 3 · answered Dec 21 '10 at 12:36

Not a perfect implementation by any means (I did write it after all!), but does this help?

/***
 * millipede: DisjointSetForest.h
 * Copyright Stuart Golodetz, 2009. All rights reserved.
 ***/

#ifndef H_MILLIPEDE_DISJOINTSETFOREST
#define H_MILLIPEDE_DISJOINTSETFOREST

#include <map>

#include <common/exceptions/Exception.h>
#include <common/io/util/OSSWrapper.h>
#include <common/util/NullType.h>

namespace mp {

/**
@brief  A disjoint set forest is a fairly standard data structure used to represent the partition of
        a set of elements into disjoint sets in such a way that common operations such as merging two
        sets together are computationally efficient.

This implementation uses the well-known union-by-rank and path compression optimizations, which together
yield an amortised complexity for key operations of O(a(n)), where a is the (extremely slow-growing)
inverse of the Ackermann function.

The implementation also allows clients to attach arbitrary data to each element, which can be useful for
some algorithms.

@tparam T   The type of data to attach to each element (arbitrary)
*/
template <typename T = NullType>
class DisjointSetForest
{
    //#################### NESTED CLASSES ####################
private:
    struct Element
    {
        T m_value;
        int m_parent;
        int m_rank;

        Element(const T& value, int parent)
        :   m_value(value), m_parent(parent), m_rank(0)
        {}
    };

    //#################### PRIVATE VARIABLES ####################
private:
    mutable std::map<int,Element> m_elements;
    int m_setCount;

    //#################### CONSTRUCTORS ####################
public:
    /**
    @brief  Constructs an empty disjoint set forest.
    */
    DisjointSetForest()
    :   m_setCount(0)
    {}

    /**
    @brief  Constructs a disjoint set forest from an initial set of elements and their associated values.

    @param[in]  initialElements     A map from the initial elements to their associated values
    */
    explicit DisjointSetForest(const std::map<int,T>& initialElements)
    :   m_setCount(0)
    {
        add_elements(initialElements);
    }

    //#################### PUBLIC METHODS ####################
public:
    /**
    @brief  Adds a single element x (and its associated value) to the disjoint set forest.

    @param[in]  x       The index of the element
    @param[in]  value   The value to initially associate with the element
    @pre
        -   x must not already be in the disjoint set forest
    */
    void add_element(int x, const T& value = T())
    {
        m_elements.insert(std::make_pair(x, Element(value, x)));
        ++m_setCount;
    }

    /**
    @brief  Adds multiple elements (and their associated values) to the disjoint set forest.

    @param[in]  elements    A map from the elements to add to their associated values
    @pre
        -   None of the elements to be added must already be in the disjoint set forest
    */
    void add_elements(const std::map<int,T>& elements)
    {
        for(typename std::map<int,T>::const_iterator it=elements.begin(), iend=elements.end(); it!=iend; ++it)
        {
            m_elements.insert(std::make_pair(it->first, Element(it->second, it->first)));
        }
        m_setCount += elements.size();
    }

    /**
    @brief  Returns the number of elements in the disjoint set forest.

    @return As described
    */
    int element_count() const
    {
        return static_cast<int>(m_elements.size());
    }

    /**
    @brief  Finds the index of the root element of the tree containing x in the disjoint set forest.

    @param[in]  x   The element whose set to determine
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    int find_set(int x) const
    {
        Element& element = get_element(x);
        int& parent = element.m_parent;
        if(parent != x)
        {
            parent = find_set(parent);
        }
        return parent;
    }

    /**
    @brief  Returns the current number of disjoint sets in the forest (i.e. the current number of trees).

    @return As described
    */
    int set_count() const
    {
        return m_setCount;
    }

    /**
    @brief  Merges the disjoint sets containing elements x and y.

    If both elements are already in the same disjoint set, this is a no-op.

    @param[in]  x   The first element
    @param[in]  y   The second element
    @pre
        -   Both x and y must be elements in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    */
    void union_sets(int x, int y)
    {
        int setX = find_set(x);
        int setY = find_set(y);
        if(setX != setY) link(setX, setY);
    }

    /**
    @brief  Returns the value associated with element x.

    @param[in]  x   The element whose value to return
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    T& value_of(int x)
    {
        return get_element(x).m_value;
    }

    /**
    @brief  Returns the value associated with element x.

    @param[in]  x   The element whose value to return
    @pre
        -   x must be an element in the disjoint set forest
    @throw Exception
        -   If the precondition is violated
    @return As described
    */
    const T& value_of(int x) const
    {
        return get_element(x).m_value;
    }

    //#################### PRIVATE METHODS ####################
private:
    Element& get_element(int x) const
    {
        typename std::map<int,Element>::iterator it = m_elements.find(x);
        if(it != m_elements.end()) return it->second;
        else throw Exception(OSSWrapper() << "No such element: " << x);
    }

    void link(int x, int y)
    {
        Element& elementX = get_element(x);
        Element& elementY = get_element(y);
        int& rankX = elementX.m_rank;
        int& rankY = elementY.m_rank;
        if(rankX > rankY)
        {
            elementY.m_parent = x;
        }
        else
        {
            elementX.m_parent = y;
            if(rankX == rankY) ++rankY;
        }
        --m_setCount;
    }
};

}

#endif

I appreciate it, but it doesn't really answer my questions. I can already find many such implementations online and I have the same problem: I don't understand why they do certain things. For example, in yours, I don't understand why you use std::map, why you use an int for Element::m_parent (I would expect a pointer), why there is a union_sets on int x, int y (ints aren't sets...), and I could go on. It would be more helpful to point out what is wrong with mine, and what I need to do to finish it, sorry. — Isaac, Dec 21 '10 at 12:45
@Isaac: I used `std::map` because I wanted to allow element indices to be non-contiguous, I used an `int` for `Element::m_parent` because it's simpler than the pointer version and easier to check for correctness (it's probably not as efficient as it could be, but that wasn't the main criterion when I was writing it), and `union_sets` unions the sets containing `x` and `y`, in the same way that `find_set` finds the root of the set containing `x`. Hope that clarifies those things at least. — Stuart Golodetz, Dec 21 '10 at 12:55
@Isaac: With reference to yours, essentially the major difference between the two is that you've elected to go down the manual memory management route which is leaving you worrying about deallocation, and I've gone out of my way to avoid that (because I've done that sort of thing in the past and know that it's a bit of a pain). My suggestion would be to avoid that like the plague -- just store elements the sort of way I'm doing (i.e. as the value type of a `map`) and save yourself a lot of hassle. If there does turn out to be a performance issue later on, you can always improve things then. — Stuart Golodetz, Dec 21 '10 at 12:59
So, I've read your code a bit more thoroughly. Am I right in thinking that you map, parent -> element(value, parent)? And that your ints are just ids? — Isaac, Dec 21 '10 at 13:28
@Isaac: Not quite -- I effectively map id -> element(value,parent). Initially, each element's parent is the element itself, so I start out with id -> element(value,id). As different sets are unioned together, this changes. (And yes, ints are just ids.) I'm incurring some unnecessary lookup costs in doing it this way -- but as I said, it wasn't designed with performance as a key criterion. It wouldn't be too hard to optimise it. — Stuart Golodetz, Dec 21 '10 at 18:05
Note that technically elements include a notion of rank as well, which is an upper bound on an element's height in the tree representing its disjoint set. It's used for the union-by-rank heuristic, and managed in the `link` method. — Stuart Golodetz, Dec 21 '10 at 18:07
@StuartGolodetz Why do you use `std::map` when you only need `std::unordered_map`? In this way you are increasing the overall complexity, and you actually do not need a collection to be sorted. — Dejan, Nov 29 '18 at 18:56
@Dejan: This was implemented in 2009, before `std::unordered_map` was added to C++. — Stuart Golodetz, Dec 01 '18 at 01:42

score 2 · Answer 4 · answered Dec 21 '10 at 12:09

I can't speak about the algorithm, but for memory management, typically you will use something called a smart pointer, which will free what it points to. You can get shared ownership and single ownership smart pointers, and non-ownership too. Correct use of these will guarantee no memory issues.

score 0 · Answer 5 · edited Apr 12 '13 at 11:41

0

This blog article shows a C++ implementation using path compression: http://toughprogramming.blogspot.com/2013/04/implementing-disjoint-sets-in-c.html

edited Apr 12 '13 at 11:41

Unai Vivi

3,073
3
30
46

answered Apr 12 '13 at 11:20

user3310464

9
1

score 0 · Answer 6 · edited Apr 19 '16 at 23:50

Have a look at this code

class Node {
    int id,rank,data;
    Node *parent;

public :

    Node(int id,int data) {
        this->id = id;
        this->data = data;
        this->rank =0;
        this->parent = this;
    }

    friend class DisjointSet;
};

class DisjointSet {
    unordered_map<int,Node*> forest;

    Node *find_set_helper(Node *aNode) {
        if( aNode->parent != aNode)
            aNode->parent = find_set_helper(aNode->parent);

        return aNode->parent;
    }

    void link(Node *xNode,Node *yNode) {
        if( xNode->rank > yNode->rank)
            yNode->parent = xNode;
        else if(xNode-> rank < yNode->rank)
            xNode->parent = yNode;
        else {
            xNode->parent = yNode;
            yNode->rank++;
        }
    }

public:
    DisjointSet() {

    }

    void make_set(int id,int data) {
        Node *aNode = new Node(id,data);
        this->forest.insert(make_pair(id,aNode));
    }

    void Union(int xId, int yId) {
        Node *xNode = find_set(xId);
        Node *yNode = find_set(yId);

        if(xNode && yNode)
            link(xNode,yNode);
    }

    Node* find_set(int id) {
        unordered_map<int,Node*> :: iterator itr = this->forest.find(id);
        if(itr == this->forest.end())
            return NULL;

        return this->find_set_helper(itr->second);
    }

    void print() {
        unordered_map<int,Node*>::iterator itr;
        for(itr = forest.begin(); itr != forest.end(); itr++) {
            cout<<"\nid : "<<itr->second->id<<"  parent :"<<itr->second->parent->id;
        }
    }

    ~DisjointSet(){
        unordered_map<int,Node*>::iterator itr;
        for(itr = forest.begin(); itr != forest.end(); itr++) {
            delete (itr->second);
        }
    }

};

I think the constructor of the class DisjointSet is missing. — pantelis300, Jan 23 '15 at 14:16

score 0 · Answer 7 · answered Feb 23 '15 at 00:11

For implementing Disjoint Sets from scratch, I highly recommend reading the book of Data Structures & Algorithm Analysis in C++ by Mark A. Weiss.

In Chap 8, it starts from basic find / union, then it gradually adds union by height / depth / rank, and find compression. In the end, it provides Big-O analysis.

Trust me, it has everything you want to know about Disjoint Sets and its C++ implementation.

score 0 · Answer 8 · answered Jul 19 '15 at 21:07

The following code seems simple to understand for implementing union-find disjoints sets by path compression

int find(int i)
{
    if(parent[i]==i)
    return i;
    else
    return parent[i]=find(parent[i]);
}
void union(int a,int b)
{
    x=find(a);y=find(b);
        if(x!=y)
        {
            if(rank[x]>rank[y])
            parent[y]=x;
            else
            {
            parent[x]=y;
            if(rank[x]==rank[y])
            rank[y]+=1;             
            }
        }
}

score 0 · Answer 9 · answered Jun 02 '16 at 12:57

If you are trying to ask which style is better to imlement disjoint set (vector or map(rb tree) ) , then i may have something to add

make_set (int key , node info ) : this is generally a member function to (1) add a node and (2) make node point to itself ( parent = key) , this intially creates a disjoint set. operation time complexity for vector O(n) , for map O(n*logn) .
find_set( int key ) : this has two fuctions generally , (1) finding the node through given key (2) path compression . I couldnt really calculate it for path compression , but for simply searching for the node , the time complexity for (1) vector O(1) and for (2) map O(log(n)) .

Concludingly , i would like to say that although looking over here , vector implementation looks better, time complexities of both are O(M*α(n)) ≈ O(M*5) or so i have read .

ps. do verify what i have written though i am pretty sure it's correct .

score 0 · Answer 10 · answered Jan 24 '12 at 20:55

Your implementation is fine. All you now need to do is to keep an array of disjoint set Nodes so that you can call your union/find methods on them.

For Kruskal's algorithm, you'll need an array containing one disjoint-set Node per graph vertex. Then, when you look up the next edge to add to your subgraph, you would use the find method to check if those nodes are both in your subgraph already. If they are, then you can move on to the next edge. Otherwise, it's time to add that edge to your subgraph and perform a union operation between the two vertices connected by that edge.

Implementing Disjoint Sets (Union Find) in C++

10 Answers10

Linked