2

I am trying to use disjoint sets from Boost but, after swimming all day through StackOverflow and the documentation I could not solve my problem.

The main problem is: given some maps that will act as Rank and Parent, of an int element type (in the future that element type needs to be a pointer to an actual object though), and after doing some union_sets, get a vector of vectors: each outer vector is a connected component number, each inner vector the list of points (or, pointers) that compose that connected component.

E.g.: v[1] -> [0, 30, 234, ...].

I have looked at this, this and this + several other questions here in SO and every result in google's front page.

I have created a small example using code from user @janoma. However, his answer, while really good, it's "too customized" to his needs and after tinkering for a while I could not see how to adapt his code to the use of std::maps.

/*!
 * Adapted from
 *   http://janoma.cl/post/using-disjoint-sets-with-a-vector/?i=1
 *   https://github.com/janoma/study/blob/master/disjoint_sets/main.cpp
 *
 */

#include <algorithm>
#include <map>
#include <iomanip>
#include <iostream>
#include <stdlib.h>
#include <random>
#include <vector>

#include <boost/pending/disjoint_sets.hpp>
#include <boost/pending/property.hpp>

typedef int element_t;

void
printElements(std::vector<int>& elements, boost::disjoint_sets<boost::associative_property_map<std::map<int,int>>, boost::associative_property_map<std::map<int,int>>> sets)
{
    std::cout << "Elements:            ";
    for (size_t i = 0; i < elements.size(); ++i)
    {
        std::cout << std::setw(4) << elements[i];
    }
    std::cout << std::endl;
    std::cout << "Set representatives: ";
    for (size_t i = 0; i < elements.size(); ++i)
    {
        std::cout << std::setw(4) << sets.find_set(elements[i]);
    }
    std::cout << std::endl;
}

int main()
{
    // initialization
    std::vector<element_t> elements;
    elements.reserve(30);
    for (size_t i = 0; i < elements.capacity(); ++i)
    {
        elements.push_back(element_t(rand() % 90));
    }

    // disjoint sets
    std::map<element_t,int> rank;
    std::map<element_t,element_t> parent;

    boost::disjoint_sets<
        boost::associative_property_map<std::map<element_t,int>>,
        boost::associative_property_map<std::map<element_t,element_t>> > sets(
            boost::make_assoc_property_map(rank),
            boost::make_assoc_property_map(parent));

    // initialize disjoint sets
    for (size_t i = 0; i < elements.size(); ++i)
    {
        sets.make_set(elements.at(i));
    }

    // unions
    for (size_t i = 0; i < elements.size()/2; ++i)
    {
        // Union between this element and one randomly chosen from the rest
        size_t j = rand() % elements.size();
        sets.union_set(elements[i], elements[j]);
    }

    std::cout << "Found " << sets.count_sets(elements.begin(), elements.end()) << " sets:" << std::endl;
    printElements(elements,sets);

    // compression
    sets.compress_sets(elements.begin(), elements.end());

    // QUICK & DIRTY
    std::vector<element_t> representatives;
    representatives.reserve(30);
    for (size_t i = 0; i < elements.capacity(); ++i)
        representatives.push_back(sets.find_set(elements[i]));
    // ---

    std::cout << std::endl << "After path compression:" << std::endl;
    printElements(elements,sets);

    std::sort(elements.begin(),elements.end(), [representatives](auto lhs, auto rhs){ return representatives[lhs] < representatives[rhs]; });

    std::cout << std::endl << "After path compression and sorting:" << std::endl;
    printElements(elements,sets);
}

The expected result would be the last part you get if you execute janoma's code, that is:

Alternative, using iterators:
    Sorted set: 1 8 12 16 23 27 32 37 46 46 50 55 60 62 69 73 76 79 87 
    Sorted set: 23 36 
    Sorted set: 62 
    Sorted set: 13 25 25 52 67 69 71 80 

Actual result is, well, I didn't get to the point of breaking it into separate lists, but:

After path compression and sorting:
Elements:              76  55  37  62  80  62  69  87  71  46  52  36  60  73  79  50  67  32  69  46  23   1   8  12  23  27  13  16  25  25
Set representatives:   76  55  37  62  80  62  50  87  71  55  52  36  60  55  55  50  52  87  50  55  55  87  50  60  55  50  52  50  52  52

It's unordered.

At this point I am left without resources to keep looking / learning how to properly use boost disjoint sets.

Adrian
  • 755
  • 9
  • 17

1 Answers1

1

I am confused about what the code in question is attempting to do exactly, but I can answer the general question "How do you retrieve the sets from boost's implementation of disjoint_sets?" Basically you use the the parent map that was constructed by the data structure.

The disjoints_sets data structures represents each set as the "children" of an arbitrary member of the set, call these arbitrary members the set representative. The parent property map built by the Boost library associates each element with the representative of the set it belongs to. To build the actual sets then, we need to essentially invert the parent map. We iterate over the parent map constructing a map from representative to elements, which is a one-to-many relationship so this map has vectors of elements as the values. The values of this map will be the sets we are looking for.

Code below. The following finds the connected components in enter image description here

I use strings as elements just to get an example on StackOverflow that does not use integer items and that is complete. Note the get_unique_vertices function is just a artifact of the design of this code which builds the graph forest solely using the edges as input. You could do the below without this step if you already know the vertices or by using the disjoint_sets data structure itself to keep track of them. I did it this way to keep the actual usage of disjoint_sets as concise as possible:

#include "boost/pending/disjoint_sets.hpp"
#include "boost/property_map/property_map.hpp"
#include <iostream>
#include <tuple>
#include <unordered_set>
#include <unordered_map>

template<typename T>
using assoc_map = boost::associative_property_map<T>;
using rank_map = std::unordered_map<std::string, int>;
using parent_map = std::unordered_map<std::string, std::string>;
using disjoint_sets = boost::disjoint_sets<assoc_map<rank_map>, assoc_map<parent_map>>;

std::vector<std::string> get_unique_vertices(const std::vector<std::tuple<std::string, std::string>>& edges)
{
    std::unordered_set<std::string> vertex_set;
    std::vector<std::string> vertices;
    vertices.reserve(2 * edges.size());
    for (const auto [u, v] : edges) {
        if (vertex_set.find(u) == vertex_set.end()) {
            vertex_set.insert(u);
            vertices.push_back(u);
        }
        if (vertex_set.find(v) == vertex_set.end()) {
            vertex_set.insert(v);
            vertices.push_back(v);
        }
    }
    return vertices;
}

std::vector<std::vector<std::string>> find_connected_components(const std::vector<std::tuple<std::string, std::string>>& edges)
{
    rank_map rank;
    parent_map parent;
    disjoint_sets ds( boost::make_assoc_property_map(rank),  boost::make_assoc_property_map(parent));

    // insert all the vertices as single sets
    auto vertices = get_unique_vertices(edges);
    for (const auto& v : vertices) {
        ds.make_set(v);
    }

    // add each graph edge to the data structure
    for (const auto [u, v] : edges) {
        ds.link(u, v);
    }

    // build a map mapping representatives to set elements...
    std::unordered_map<std::string, std::vector<std::string>> sets;
    for (const auto& v : vertices) {
        auto parent = ds.find_set(v);
        sets[parent].push_back(v);
    }

    // return just the values from the above
    std::vector<std::vector<std::string>> output(sets.size());
    std::transform(sets.begin(), sets.end(), output.begin(),
        [](const auto& key_val) {
            return key_val.second;
        }
    );

    return output;
}

int main()
{
    std::vector<std::tuple<std::string, std::string>> edges = {
       {"A" , "B"},
       {"D" , "E"},
       {"H" , "I"},
       {"K" , "J"},
       {"E" , "F"},
       {"B" , "C"},
       {"H" , "K"},
       {"E" , "G"},
       {"I" , "J"}
    };

    auto connected_components = find_connected_components(edges);
    for (const auto& cc : connected_components) {
        for (const auto& vertex : cc)
            std::cout << vertex;
        std::cout << "\n";
    }
}

which yields the following output:

HIKJ
ABC
DEFG
jwezorek
  • 8,592
  • 1
  • 29
  • 46
  • Sadly it's been over a year and a half and I no longer work on that code nor do I have access to the development environment to test your answer, but I'll accept it as a "thank you" for the effort. IIRC, what I was trying to do is implementing a clustering algorithm that had to return lists of nodes (the connected components of the graph). Disjoint sets seemed the cleanest/most efficient way to proceed. – Adrian Sep 08 '20 at 07:49
  • in this solution, you have to know all the vertices which might not be available in some applications. – Jingpeng Wu Jul 20 '21 at 16:06
  • well if you know the edges then you know the vertices, but you may have to dedup them into a list of unique vertices as i do in this solution – jwezorek Jul 20 '21 at 16:41