1

I have been browsing posts on this site, the library's documentation, and discussions and explanations on other sites. However, I am struggling to understand how the C++ Boost Graph library works in my application. My questions are highlighted in bold. I list them chronologically regarding what part of the function they refer to. The bottom question is the most essential one though.

Consider the following example using C++ 17:

#include <vector>
#include <boost/config.hpp>
#include <boost/graph/graph_traits.hpp>
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/dijkstra_shortest_paths.hpp>
#include <boost/property_map/property_map.hpp>

typedef boost::adjacency_list <boost::listS, boost::vecS, boost::directedS, boost::no_property, boost::property<boost::edge_weight_t, double> > graph_t;
typedef boost::graph_traits<graph_t>::vertex_descriptor vertex_descriptor;

void example_paths(std::vector<int>& from, std::vector<int>& to, std::vector<double>& weights, int source, std::vector<int>& targets) {

}

The input vectors from, to, and weights have around 2.2 billion elements each and denote the edges of the graph. from hosts the origin IDs, to the destination IDs, and weights the edge weights. I.e. the first edge goes from vertex from[0] to vertex to[0] and has weight weights[0]. This data creates a graph with around 250 million vertices. Almost all vertices are connected to the main graph. Strictly speaking, it is thus a multi-graph, are however a graph made up of multiple graphs is called in this library. From that graph, I want to compute the shortest paths from the vertex with the ID source to the vertices with the IDs targets.

In my code, I follow the documentation's example and define the OutEdgeList as a boost::listS and the VertexList as a boost::vecS. I read that the types (boost::vecS, boost::listS, boost::slistS, boost::setS, boost::multisetS, and boost::hash_setS) available for OutEdgeList and VertexList differ in terms of computational efficiency. As I have to consider the total computational efficiency, taking both the graph construction from the input vectors and the application of Dijkstra's algorithm into account, I do not know which type I should choose. What types am I supposed to use for OutEdgeList and VertexList?

That brings me to my second question. I have found two drastically different methods of adjacency list construction:

graph_t g;
for(int i = 0; i < from.size(); i++) {
  boost::add_edge (from[i], to[i], weights[i], g);
}

and

typedef std::pair< int, int > Edge;
const int num_nodes = 5;
enum nodes { A, B, C, D, E };
char name[] = "ABCDE";
Edge edge_array[] = { Edge(A, C), Edge(B, B), Edge(B, D), Edge(B, E),
  Edge(C, B), Edge(C, D), Edge(D, E), Edge(E, A), Edge(E, B) };
int weights[] = { 1, 2, 1, 2, 7, 3, 1, 1, 1 };
int num_arcs = sizeof(edge_array) / sizeof(Edge);
graph_t g(edge_array, edge_array + num_arcs, weights, num_nodes);

where the second approach is taken from the documentation and not adjusted to my application. Which approach is more efficient? If it is the second one, what would be the translation for my example? And is it a problem that I use numeric instead of character IDs?

In general, I want to implement three variants of this shortest path function. The first variant only obtains the vertex IDs along the shortest paths. I.e. it generates a std::vector<std::vector<int> > where the first element is a vector from source to targets[0] etc. The second variant only returns the distances, i.e. a std::vector<double> where the first element is the sum of edge weights along the shortest path between source and targets[0] etc. The third variant returns the output of both the other variants. From what I understand, calculating the shortest paths works something like this

std::vector<vertex_descriptor> p(num_vertices(g));
std::vector<double> d(num_vertices(g));
vertex_descriptor s = vertex(1, g);
std::set<vertex_descriptor> t = {vertex(5, g), vertex(7, g)};

boost::dijkstra_shortest_paths(g, s, boost::predecessor_map(&p[0]).distance_map(&d[0]).visitor(target_visit(targets, on_examine_vertex())));

where I can add or leave out parts, depending on whether I want to extract the distances or the vertex IDs along the paths. Though, I do not really understand how it works. How do I call boost::dijkstra_shortest_paths to obtain the std::vectors in each of the three function variants described above?

I am sorry for this lengthy post. I suppose that these issues are intuitive for someone experienced with this library, but I have been stuck on them for days. I am aware that some people on this platform are sensitive to questions i.a. asking about efficiency, but dealing with billions of edges requires computational efficiency to be a primary consideration.

Please do not hesitate to ask questions in case parts of my post require clarification.

Thanks in advance.

Chr
  • 1,017
  • 1
  • 8
  • 29
  • 1
    "Which approach is more efficient?" As always, implement both and measure their performance. You will need to do this, no matter what opinions you see in replies to your question. – ravenspoint Aug 05 '23 at 15:44
  • @ravenspoint For me to test the efficiency, I first need to have a better understanding how e.g. adjacency list construction works. I.e. I need to have at least some of my questions answered before I can meaningfully test efficiency. – Chr Aug 05 '23 at 15:48
  • Which of the several questions in your post do you want answered first? ( Ideally, one question per post makes life so much simpler ) – ravenspoint Aug 05 '23 at 16:24
  • @ravenspoint The bottom question is the most important one. Without an answer to that, the function does not work at all. – Chr Aug 05 '23 at 16:47
  • Please edit your post so the "bottom" question is at the top and preferably is the only question. – ravenspoint Aug 05 '23 at 16:58
  • @ravenspoint I see your point, but object to splitting this post across questions nonetheless because of the following reasons. (i) My questions are interconnected. Posting them together, makes the topic considerably easier to understand. (ii) Stack Overflow is after all one of the most toxic sites on the internet - in particular when it comes to C++ questions. I predict that splitting my post would lead to the new questions being downvoted and closed for being duplicates, unclear, or off topic within minutes. I will highlight the bottom question though. – Chr Aug 05 '23 at 17:11
  • @ravenspoint I kept the chronological order for clarity, but edited the post to point to the bottom question in the introduction. – Chr Aug 05 '23 at 17:16

1 Answers1

2

How do I call boost::dijkstra_shortest_paths to obtain the std::vectors in each of the three function variants described above?

One call to boost::dijkstra_shortest_paths() gives you all the data that you need. To extract the data in the form you want requires working through the 'syntactical sugar' that encumbers boost::graph

Variant 1 : paths from source to all targets

This requires backtracking from target to source through the parent vector. Like this:

std::vector<std::vector<int>> Variant1(
    graph_t &g,
    const std::vector<int> &p)
{
    std::vector<std::vector<int>> ret;
    for (int target = 1; target < name.length(); target++)
    {
        std::vector<int> path;
        int vertex = target;
        while (vertex != 0)
        {
            path.push_back(vertex);
            vertex = p[vertex];
        }
        std::reverse(path.begin(), path.end());
        ret.push_back(path);
    }
    return ret;
}

Variant 2 : distance from source to all targets

The distance vector gives you this directly

Variant 3 : returns the output of both the other variants.

Just what it says.


You can do all three in one pass, something like this:

dijkstra_shortest_paths(
    g, s,
    predecessor_map(make_iterator_property_map(
                        p.begin(), get(vertex_index, g)))
        .distance_map(make_iterator_property_map(
            d.begin(), get(vertex_index, g))));

displayVariant1(Variant1(g, p));
displayVariant2(g, d);
displayVariant3(g, p, d);

which outputs:

Variant1:
Path from A to C: C D E B
Path from A to C: C
Path from A to C: C D
Path from A to C: C D E

Variant2:
distance from A to A = 0
distance from A to B = 6
distance from A to C = 1
distance from A to D = 4
distance from A to E = 5

Variant3:
Path from A to A:  C D E B Total distance: 0
Path from A to B:  C Total distance: 6
Path from A to C:  C D Total distance: 1
Path from A to D:  C D E Total distance: 4
Path from A to E:  Total distance: 5

Note that if you only want one of variant 1 or 2 without 3 at all, you can omit the unneeded map. Like this:

dijkstra_shortest_paths(
    g, s,
    distance_map(make_iterator_property_map(
            d.begin(), get(vertex_index, g))));

displayVariant2(g, d);

The complete code for the entire application is at https://gist.github.com/JamesBremner/30a8120ffac6ce884a5ee290eff9d08e

For my general comment on dealing with the baroque boost::graph docs see https://stackoverflow.com/a/1514439/16582




The above answers your question as posed. But, this is NOT what I would do for real work. So, here is some general, unsolicited advice.

  1. Use bundled properties rather than property maps. This is much simpler. https://www.boost.org/doc/libs/1_82_0/libs/graph/doc/bundles.html

  2. The boost::graph application programming interface ( API ) is both arcane and outdated. To simplify routine coding I would hide the the API in a thin wrapper class. As an example: https://github.com/JamesBremner/so76842374

  3. The boost::graph API is the way it is because it is optimized to be generic, i.e. extremely flexible and applicable to almost any graph theory problem. It is NOT optimized for performance. To handle billions of edges, I think you will need something more streamlined. For example, PathFinder is able to find the shortest paths from a source to every other vertex in a graph with millions of edges in seconds. ( details at https://github.com/JamesBremner/PathFinder/wiki/Costs#performance )

ravenspoint
  • 19,093
  • 6
  • 57
  • 103
  • Thanks. Do I always need to run the full `dijkstra_shortest_paths(g, s, predecessor_map(boost::make_iterator_property_map(p.begin(), get(boost::vertex_index, g))).distance_map(boost::make_iterator_property_map(d.begin(), get(boost::vertex_index, g))));`, irrespective of which variant I want to call? Or can I omit part of it, if I only want variant 1 or variant 2? – Chr Aug 07 '23 at 08:50
  • 1
    Yes. See update to answer. – ravenspoint Aug 07 '23 at 12:14
  • 1
    @Chr FYI I have added some general advice to my answer. – ravenspoint Aug 07 '23 at 15:23
  • Thanks. I will play around with that. – Chr Aug 07 '23 at 17:21