How do I extract a subgraph from a directed graph using Boost.Graph?

Question

I have a very large graph dot file and want to extract a subgraph from a given vertex. This subgraph should contain the given vertex and all vertexes below it.

#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/depth_first_search.hpp>
#include <boost/graph/graphviz.hpp>

#include <iostream>

struct Vertex {
    std::string node;
};

using Graph = boost::adjacency_list<boost::vecS, boost::vecS, boost::directedS, Vertex>;

template <typename Fn>
requires std::is_invocable_r_v<bool, Fn, Graph::vertex_descriptor>
void removeVertexIf(Graph& graph, Fn const& fn) {
    Graph::vertex_descriptor count = num_vertices(graph);
    Graph::vertex_descriptor i = 0;
    while (i < count) {
        if (fn(i)) {
            clear_vertex(i, graph);
            remove_vertex(i, graph);
            --count;
        }
        else {
            ++i;
        }
    }
}

struct DFSVisitor : boost::default_dfs_visitor {
    DFSVisitor(std::vector<bool>& reachable)
        : reachable(reachable)
    {}

    void discover_vertex(Graph::vertex_descriptor const index, Graph const&) {
        reachable[index] = true;
    }

    std::vector<bool>& reachable;
};

void removeUnreachable(Graph& graph, Graph::vertex_descriptor const start_index) {
    std::vector<bool> reachable(num_vertices(graph), false);

    DFSVisitor visitor(reachable);

    depth_first_search(graph, boost::visitor(visitor).root_vertex(start_index));

    removeVertexIf(graph, [&](Graph::vertex_descriptor const index) {
        return reachable[index];
    });
}

int main() {
    std::istringstream input(
        "digraph{"
        "0;1;2;3;4;5;6;7;8;9;"
        "0->1;1->2;2->3;2->6;3->4;4->5;5->8;6->7;6->5;7->8;8->9;"
        "}");

    Graph g(0);

    boost::dynamic_properties dp(boost::ignore_other_properties);
    dp.property("node_id", get(&Vertex::node, g));

    boost::read_graphviz(input, g, dp);

    // delete everything that is not below Node 6
    removeUnreachable(g, 6);

    boost::write_graphviz(std::cout, g);
}

In this minimal example, the given vertex has the node ID 6. The following graphic shows what I want to extract:

How can I remove the nodes and edges that are not below Node 6? My current removeUnreachable iterates over the entire graph instead of starting at start_index by depth_first_search.

Dot file to SVG graphic:

dot -Tsvg out.dot -o out.svg

I am having a hard time understanding your code. It looks like your are using BOTH the visitor AND the color map. You must choose one or the other, not both. — ravenspoint, Aug 30 '23 at 13:16
@ravenspoint Okay, I managed to use the named parameter syntax so I don't have to provide a colormap. Unfortunately, the search still iterates over the entire tree instead of starting at node 6. — Benjamin Buch, Aug 30 '23 at 14:00
@ravenspoint can you point to the documentation that confirms your claim? TTBOMK you can (obviously?) combine these just fine. In fact, when using the correct algorithm (see my answer) it will be required. — sehe, Aug 30 '23 at 16:38
@ravenspoint Literally the only claim you made? "You must choose one or the other, not both" (If you to @ sehe I'm less likely to miss replies) — sehe, Aug 30 '23 at 19:44
@sehe His claim referred to an older version of my question where I tried to provide a default colormap because I didn't understand the named parameter syntax. It might be possible to use both together, but it wasn't of any use in this context. His hint helped my to find my current solution at this point. Tomorrow I will go through your very detailed answer. Thanks a lot for making so much effort! — Benjamin Buch, Aug 30 '23 at 20:48

ravenspoint · Answer 1 · 2023-08-30T14:36:51.380

boost graph has a method to read a dot file into a boost graph.

Start at your specified node and do a DFS or BFS. Mark nodes as you visit them. When the search ends, remove unmarked nodes and all their links.

In more detail:

define a visitor that marks each node as visited. Easiest would be a functor that has a boolean vector sized with the number of nodes and initialized false.
call detph_first_search with your visitor
loop over the nodes, removing any that are still false and their edges.

Alternatively, you might want to use the 'color map'. Lots of code for doing that here https://stackoverflow.com/a/45828012/16582

Personally, I no longer use boost::graph - the API is clunky and a bit old fashioned, while the documentation is downright mysterious. I prefer to use my graph theory library at https://github.com/JamesBremner/PathFinder

So this works well:

#include <string>
#include <fstream>
#include <sstream>
#include <iostream>
#include <vector>
#include <queue>
#include <algorithm>

#include "GraphTheory.h"

#include <iostream>

void readDot(std::istringstream &input, raven::graph::sGraphData &gd)
{
    gd.g.clear();
    gd.g.directed();

    std::string a;
    while (getline(input, a, ';'))
    {
        int p = a.find("->");
        if (p != -1)
        {
            gd.g.add(a.substr(0, p),
                     a.substr(p + 2));
        }
    }
}

void bfs(
    raven::graph::sGraphData &gd)
{
    // queue of visited vertices with unsearched children
    std::queue<int> Q;

    // track nodes that have been visited
    // prevent getting caught going around and around a cycle
    std::vector<bool> visited(gd.g.vertexCount(), false);


    // start at start
    int v = gd.g.find(gd.startName);
    std::cout << gd.g.userName(v) << " ";
    Q.push(v);
    visited[v] = true;

    // loop while the queue is not empty
    while (!Q.empty())
    {
        // get current vertex from front of queue
        v = Q.front();
        Q.pop();

        // loop over vertices reachable from current vertex
        for (int u : gd.g.adjacentOut(v))
        {
            if (!visited[u])
            {
                // visit to a new node
                std::cout << gd.g.userName(u) << " ";

                // add to queue and mark visited
                Q.push(u);
                visited[u] = true;
            }
        }
    }

    std::cout << "\n";
}

int main()
{
    std::istringstream input(
        "digraph{"
        "0;1;2;3;4;5;6;7;8;9;"
        "0->1;1->2;2->3;2->6;3->4;4->5;5->8;6->7;6->5;7->8;8->9;"
        "}");
    const std::string start_index = "6";

    raven::graph::sGraphData gd;
    gd.startName = "6";

    readDot(input, gd);

    bfs(gd);
}

Giving the output

6 7 5 8 9

My code already contains the load and as I wrote I failed to solve it by using a BFS. I don't understand how I can do what you describe. That's excatly my problem. — Benjamin Buch, Aug 29 '23 at 17:22
I have provided more detail. Not sure what more I could say without writing the code for you. — ravenspoint, Aug 29 '23 at 17:36
Why don't you post YOUR code and we can maybe spot any bugs. — ravenspoint, Aug 29 '23 at 17:37
That's definitely helpfull, so I upvoted your answer. I'm in a company context where I can't just use other libraries, so its not a solution for me. Nevertheless you pointed me into the right direction. I will post my solution as a separate answer. Thank you very much! — Benjamin Buch, Aug 30 '23 at 15:53

sehe · Accepted Answer · 2023-08-30T16:43:55.660

You already figured out that depht_first_search does more than you thought it did. Instead of complicating the visitor, I'd suggest to use depht_first_visit instead: https://www.boost.org/doc/libs/1_83_0/libs/graph/doc/depth_first_visit.html

Other Issues

Adjacency lists with vertex container selector vecS have an implied contiguous integral vertex index, which doubles as the descriptor in that case. You must have been somewhat aware of this because your spelled it out:

Graph::vertex_discriptor count = num_vertices(graph);

When you remove an early vertex, you are effectively renumbering all vertices with higher index. This make it so that your removeVertexIf loop invalidates the values inside the reachable map.

One way to avoid this would be to go by the name property (Vertex::node in your example). Another way is to renumber your unreachable entries in parallel with the removal, but this breaks the encapsulation of the predicate function: the predicate now must know about the remove algorithms internals.

Another option, of course, would be to have a (temporary) extra mapping that indirects the original vertex index to the current index.

Lastly you could select a vertex container that has reference and descriptor stability (e.g. setS and listS).

Note that it may be much more performant to NOT REMOVE any vertices, instead just filtering them out on the fly. I'll present this as the BONUS take below

Plenty of options, let's go with the simplest:

ListS

See it Live On Coliru

#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/depth_first_search.hpp>
#include <boost/graph/graphviz.hpp>
#include <iostream>

using Graph = boost::adjacency_list<boost::vecS, boost::listS, boost::directedS, //
                                    boost::property<boost::vertex_index_t, size_t>>;
using V     = Graph::vertex_descriptor;
using VSet  = std::set<V>;

template <typename Pred>
    requires std::predicate<Pred, V>
void removeVertexIf(Graph& g, Pred pred) {
    for (auto [it, e] = vertices(g); it != e;) {
        if (auto v = *it++; pred(v)) {
            clear_vertex(v, g);
            remove_vertex(v, g);
        }
    }
}

struct DFSVisitor : boost::default_dfs_visitor {
    DFSVisitor(VSet& reachable) : reachable(reachable) {}
    void discover_vertex(V v, Graph const& g) const {
        std::cout << "Marking " << get(boost::vertex_index, g, v) << " reachable\n";
        reachable.insert(v);
    }

    VSet& reachable;
};

void removeUnreachable(Graph& g, V start) {
    VSet reachable;
    std::vector<boost::default_color_type> colors(num_vertices(g));
    DFSVisitor visitor(reachable);

    auto idx = get(boost::vertex_index, g);
    depth_first_visit(g, start, visitor, make_iterator_property_map(colors.begin(), idx));

    removeVertexIf(g, [&](V v) { return !reachable.contains(v); });
}

int main() {
    Graph g;
    auto idx = get(boost::vertex_index, g);

    {
        std::istringstream        input("digraph{0->1;1->2;2->3;2->6;3->4;4->5;5->8;6->7;6->5;7->8;8->9;}");
        boost::dynamic_properties dp(boost::ignore_other_properties);
        dp.property("node_id", idx);

        read_graphviz(input, g, dp);
    }

    // delete everything that is not below Node 6
    auto v6 = vertex(6, g);
    assert(idx[v6] == 6);
    removeUnreachable(g, v6);

    write_graphviz(std::cout, g);
}

Of course you can make it work for vecS iff you remove in the correct order:

template <typename Pred>
    requires std::predicate<Pred, V>
void removeVertexIf(Graph& g, Pred pred) {
    for (auto v : boost::adaptors::reverse(vertices(g))) {
        if (pred(v)) {
            clear_vertex(v, g);
            remove_vertex(v, g);
        }
    }
}

Note though that you get the output you should expect: Live On Coliru

digraph G {
0;
1;
2;
3;
4;
0->3 ;
1->2 ;
1->0 ;
2->3 ;
3->4 ;
}

To keep the original node ids, make it explicit:

write_graphviz_dp(std::cout, g, dp);

Now printing Live On Coliru:

digraph G {
5;
6;
7;
8;
9;
5->8 ;
6->7 ;
6->5 ;
7->8 ;
8->9 ;
}

BONUS: Why Remove At All?

The performance of clear_vertex and remove_vertex is going to make you cry¹. Instead, just filter for your target vertices:

Live On Coliru

#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/depth_first_search.hpp>
#include <boost/graph/filtered_graph.hpp>
#include <boost/graph/graphviz.hpp>
#include <iostream>

struct VProp { size_t original_idx; };
using Graph = boost::adjacency_list<boost::vecS, boost::vecS, boost::directedS, VProp>;
using V     = Graph::vertex_descriptor;
using VSet  = std::set<V>;

struct DFSVisitor : boost::default_dfs_visitor {
    DFSVisitor(VSet& reachable) : reachable(reachable) {}
    void  discover_vertex(V v, Graph const&) const { reachable.insert(v); }
    VSet& reachable;
};

VSet reachable(Graph& g, V start) {
    VSet reachable;
    std::vector<boost::default_color_type> colors(num_vertices(g));
    DFSVisitor visitor(reachable);

    depth_first_visit(g, start, visitor, colors.data());

    return reachable;
}

int main() {
    Graph g;
    auto idx = get(&VProp::original_idx, g);

    boost::dynamic_properties dp(boost::ignore_other_properties);
    dp.property("node_id", idx);

    {
        std::istringstream input("digraph{0->1;1->2;2->3;2->6;3->4;4->5;5->8;6->7;6->5;7->8;8->9;}");
        read_graphviz(input, g, dp);
    }

    // filter subgraph below Node 6
    auto const vv = reachable(g, vertex(6, g));
    write_graphviz_dp(
        std::cout,
        boost::filtered_graph(g, boost::keep_all{}, std::function([&vv](V v) { return vv.contains(v); })),
        dp);
}

Printing, again:

digraph G {
5;
6;
7;
8;
9;
5->8 ;
6->7 ;
6->5 ;
7->8 ;
8->9 ;
}

¹ see e.g. Remove 100,000+ nodes from a Boost graph

How do I extract a subgraph from a directed graph using Boost.Graph?

2 Answers2

Other Issues

ListS

BONUS: Why Remove At All?