Why does the read_graphviz() function of the boost graph library changes the index of the nodes

Question

When reading a graph with read_graphviz() of the boost graph Library, the internal index used to store the nodes changes compared with the one used to display the same graph. This is problematic when dealing with big graphs as it becomes difficult to compare them as text files.

I wrote a small example (see below the source code), where I start reading a small graph with

INPUT

digraph G {
0[index=0];
1[index=1];
2[index=2];
10[index=10];
}

But then, when printing it with write_graphviz() the output ordering changes:

OUTPUT

digraph G {
0[index=0];
1[index=1];
2[index=10]; /* <= mismatch here ! */
3[index=2];  /* <= and here !      */
}

Although I understand why the node index 10 is reaffected to a smaller value such as all indexes follow each others in ascending order, I don't get why it gets affected index 2 and not 3 as it comes after the index 2 (in input).

I guess that it is because the order used must be somehow a lexicographic order on the index property interpreted as a string, but is this really the case ? In my own program, nodes indexes go above 150 and I get the following reordering:

0[index=0000];
1[index=0001];
2[index=0010];
3[index=0100];
4[index=0101];
5[index=0102];
6[index=0103];
7[index=0104];
8[index=0105];
9[index=0106];
10[index=0107];
11[index=0108];
12[index=0109];
13[index=0011];
14[index=0110];
15[index=0111];

My Question: how can I avoid this behavior such that internal node index always match my own index ?

Here is the code of this simple example:

#include <iostream>
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/graphviz.hpp>

using namespace std;

struct Node { std::size_t index; };


/*! \brief DAG: Directed Acyclic Graph */
typedef boost::adjacency_list<boost::listS, boost::vecS, boost::directedS, Node> DAG;

template<class IndexMap> class node_writer {
public:
    node_writer(IndexMap n) : inm(n) {}
    template<class Vertex>  void operator()(std::ostream & out, const Vertex & v) {
        out << "[index=" << inm[v] << "]";
    }

private:
    IndexMap inm;
};

template<class IndexMap>
inline node_writer<IndexMap> make_node_writer(IndexMap inm) {
    return node_writer<IndexMap>(inm);
}

std::ostream & operator<<(std::ostream & o, const DAG & g) {
    boost::write_graphviz(o, g, make_node_writer(boost::get(&Node::index, g)));
    return o;
}

std::istream & operator>>(std::istream & i, DAG & g)
{
    boost::dynamic_properties dp(boost::ignore_other_properties);
    dp.property("index", get(&Node::index, g));
    read_graphviz(i, g, dp);
    return i;
}

int main(int argc, char * argv[])
{
    DAG * pTree = new DAG(0);
    std::stringstream  dag_file;
    dag_file << "digraph G {\n";
    dag_file << "0[index=0];\n";
    dag_file << "1[index=1];\n";
    dag_file << "2[index=2];\n"; 
    dag_file << "10[index=10];\n";
        dag_file << "}";
    std::cout << dag_file.str() << "\n";
    dag_file >> *pTree;
    std::cout <<  *pTree;
    return 0;
}

score 1 · Accepted Answer · answered Feb 05 '20 at 00:15

My Question: how can I avoid this behavior such that internal node index always match my own index ?

You have to make it print as the node_id as well as the index attribute. I'd suggest using dynamic properties once again:

std::ostream & operator<<(std::ostream & o, DAG& g) {
    boost::dynamic_properties dp(boost::ignore_other_properties);
    dp.property("node_id", get(&Node::index, g));
    dp.property("index", get(&Node::index, g));
    boost::write_graphviz_dp(o, g, dp);
    return o;
}

Now it prints: Live On Coliru

digraph G {
0 [index=0];
1 [index=1];
10 [index=10];
2 [index=2];
}

Notes:

for technical reasons, dynamic_properties does not accept read-only property maps by default. You can work around this so you can define operator<< to take DAG const&, see boost::dynamic_properties and immutable graph object
You can also READ with node_id but there are some caveats. Your current graph model has a vertex container selector (vecS) that has an implicit vertex_index that matches the vector index. Reading the node_id from the index attribute would lead to a "sparse" graph (vertices 3..9 would also exist without any edges and no meaningful index property).

You can change the selector to something else, but then you require a vertex_index property map. The only elegant way to supply it implicitly to relevant algorithms would be
- using internal properties (instead of the property bundle that you're currently using, which can be cumbersome)
- using traits specializing the propert-map selector
Showing how to do that, just for completeness: Live On Coliru

Note that this no longer specifically reads the index attribute, because the node_id already populates Node::index directly.

Full Listing

Of the first solution above:

Live On Coliru

#include <iostream>
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/graphviz.hpp>

using namespace std;

struct Node { std::size_t index; };

/*! \brief DAG: Directed Acyclic Graph */
typedef boost::adjacency_list<boost::listS, boost::vecS, boost::directedS, Node> DAG;

std::ostream & operator<<(std::ostream & o, DAG& g) {
    boost::dynamic_properties dp(boost::ignore_other_properties);
    dp.property("node_id", get(&Node::index, g));
    dp.property("index", get(&Node::index, g));
    boost::write_graphviz_dp(o, g, dp);
    return o;
}

std::istream & operator>>(std::istream & i, DAG & g) {
    boost::dynamic_properties dp(boost::ignore_other_properties);
    dp.property("index", get(&Node::index, g));
    read_graphviz(i, g, dp);
    return i;
}

int main() {
    auto const input = R"(digraph G {
    0[index=0];
    1[index=1];
    2[index=2];
    10[index=10];
})";
    std::stringstream dag_file(input);
    std::cout << input << std::endl;

    {
        DAG tree;
        dag_file >> tree;
        std::cout << tree;
    }
}

Thank you, that is clear. When you know who you write a question for, it encourages you to write it well. The next step is to make sure that write_graphivz() prints the nodes in the same order than the one used to read them...within my reach I guess. — Heyji, Feb 05 '20 at 20:18
I don't think it's easy/possible using `read_graphviz` because it reads into an intermediate `parse_result` which hardcodes `nodes` as a `std::map` which means the ordering is lost. However, if you care about ordering, you could consider parsing your graph manually. Note however, that in the general case you **cannot** expect to roundtrip the entire source without loss of information, since the graphviz description language is so versatile and the semantics do NOT match the format. I.e. many input representations can match the same graph. — sehe, Feb 06 '20 at 00:08
For a very detailed background on how the graphviz format describes graphs and why you need a very elaborate AST to model the input with full fidelity) see [this older answer](https://stackoverflow.com/a/46365596/85371) — sehe, Feb 06 '20 at 00:10
A very minimal subset of grammar that will 1. read your sample input (no edges) 2. preserve the vertex order could be done in ~25 lines of code, like here: http://coliru.stacked-crooked.com/a/17861a8a27017175 [note: no longer links to boost_graph or boost_regex] — sehe, Feb 06 '20 at 00:51
If you're not worried about reading partial graphs on error, then you could even shorten using a semantic action: http://coliru.stacked-crooked.com/a/932d031a7c36c72f — sehe, Feb 06 '20 at 00:59
Afterthought: Even simpler omitting the attributes of all ID_ http://coliru.stacked-crooked.com/a/3e5bb9f7d0a58514 — sehe, Feb 06 '20 at 03:22
Ouch ! Two years ago, I dove into boost::spirit, qi and all this stuff in the hope of writing my own parser (for my own needs). After several months, and although I pretend to have a good theoretical background to understand and manipulate these concepts, I confess I had to quit. Yes it is short and very well documented but the concepts needed to write these short pieces of code are tough to master (honestly). It is easier to learn the syntax of a flex/bison than learning spirit/qi/Karma/phenix/lex/lambdas/concepts/bgl/property maps/property bundles/internal propertie — Heyji, Feb 06 '20 at 22:32
Looks like I will have to get back to it :-) Thanks anyway for all the materials. It is way more than expected, and I only have one vote for your answer ! Enough posting on SO for the next six months. — Heyji, Feb 06 '20 at 22:34
I can't disagree on the learning curve of boost libraries. They're like piecing together building blocks from atoms. It's incredibly versatile, but you have to cut through a lot of learning to employ these effectively. I use Boost liberally, but I always make sure I compartmentalize the complexity behind a simple interface. Good luck with the project :) — sehe, Feb 07 '20 at 13:28

Why does the read_graphviz() function of the boost graph library changes the index of the nodes

1 Answers1

Full Listing