1

I'm using the boost graph library to represent graph data. Algorithms and serialization are pretty well supported and the documentation is quite thorough.

That been said, I've come to a point where I need to "version my graph", and cannot find any related tutorial or example. By version I mean the ability to:

  • Efficiently record version information in a graph, e.g. v1.0, v1.1 etc, and provide a succinct way to work with a specific version of the graph.
  • Handle serialization and information retrieval. E.g. how do I store the versioned graph and can I produce the "relative difference" between two versions?

I'd like to know if there is a widely accepted algorithmic approach or a BGL specific recipe that would apply in my case.

As a minimal example you can consider the family tree example:

#include <boost/config.hpp>
#include <iostream>
#include <vector>
#include <string>
#include <boost/graph/adjacency_list.hpp>
#include <boost/tuple/tuple.hpp>
enum family
{
    Jeanie,
    Debbie,
    Rick,
    John,
    Amanda,
    Margaret,
    Benjamin,
    N
};
int main()
{
    using namespace boost;
    const char* name[] = { "Jeanie", "Debbie", "Rick", "John", "Amanda",
        "Margaret", "Benjamin" };

    adjacency_list<> g(N);
    // 1. How do I extend the graph accounting for versions?
    add_edge(Jeanie, Debbie, g);
    add_edge(Jeanie, Rick, g);
    add_edge(Jeanie, John, g);
    add_edge(Debbie, Amanda, g);
    add_edge(Rick, Margaret, g);
    add_edge(John, Benjamin, g);

    graph_traits< adjacency_list<> >::vertex_iterator i, end;
    graph_traits< adjacency_list<> >::adjacency_iterator ai, a_end;
    property_map< adjacency_list<>, vertex_index_t >::type index_map
        = get(vertex_index, g);

    // 2. How do I iterate the graph based on versions?
    for (boost::tie(i, end) = vertices(g); i != end; ++i)
    {
        std::cout << name[get(index_map, *i)];
        boost::tie(ai, a_end) = adjacent_vertices(*i, g);
        if (ai == a_end)
            std::cout << " has no children";
        else
            std::cout << " is the parent of ";
        for (; ai != a_end; ++ai)
        {
            std::cout << name[get(index_map, *ai)];
            if (boost::next(ai) != a_end)
                std::cout << ", ";
        }
        std::cout << std::endl;
    }
    return EXIT_SUCCESS;
}

My question imposes three problems which I annotate in the code:

  1. If the family tree had a version of the family in the 1990's and one in 2010's, how could I extend the graph considering v1990 and v2010?
  2. When working with a specific version, can I isolate or exclusively use that version's data in my computations?
  3. How is serialization affected?
Lorah Attkins
  • 5,331
  • 3
  • 29
  • 63

2 Answers2

2

It seems you want to do versioning INSIDE the graph theory library. IMHO this is outside the scope of a graph library.

Suggestion: store your graph configuration ( nodes, edges, and their attributes ) in a text file with a format that is easy to read into the library and write out from the library. Then use version control software ( e.g. GIT ) to manage the versions of the text file.


On the question of "meaningful" diffs, it rather depends on what you consider meaningful. However, if you choose your text file format carefully you can exercise some control over what the diffs look like.

For example, a graph with two nodes and one edge ( my example has no attributes - usually I append their values to the end of the node or edge line )

n a
n b


e a b

Now add a node and an edge

n a
n b
n c


e a b
e b c

will give you a diff that looks like this ( depending on what diff tool and options you use )

enter image description here

To me, this seems reasonably "meaningful".

Your use case, as you have described it, seems not to need anything more complex than this.

Here is your description

  1. If the family tree had a version of the family in the 1990's and one in 2010's, how could I extend the graph considering v1990 and v2010?

  2. When working with a specific version, can I isolate or exclusively use that version's data in my computations?

  3. How is serialization affected?

In my proposal:

  1. The graph is not extended. The versioning is done by a version control package such as GIT

  2. You check out the required version from the version control package

  3. You will need to design the serialization format with some care so that the differences will be most meaningful to you. You can try using a serialization function provided by the library, but will likely need to write your own to get the maximum "meaning".

If you have any more requirements than these that necessitate anything more complex, then you will need to to specify them in your question

ravenspoint
  • 19,093
  • 6
  • 57
  • 103
  • Using git over the serialized output sounds smart and promising. Does this mean I'd also get "diffs" for free ? (hm on second thought I'd certainly need some kind of parser to turn textual diffs into sth meaningful) – Lorah Attkins Dec 20 '22 at 15:50
  • "sth meaningful" ???? – ravenspoint Dec 20 '22 at 15:51
  • I mean, a piece of text must be fed back into the data model of the graph. Whether that text lies 10 levels deep into say the serialized json or is an updated node I'd have to somehow figure out what it represents and reconstruct a data model that shows e.g. "one updated node", or "one updated edge" or "one deleted edge" – Lorah Attkins Dec 20 '22 at 16:06
  • 1
    Have you ever used tagging or temporal relations? My search shows that to be the most concise method of versioning but cannot find any canonical examples. It should be usable with BGL and boil down to adding an extra property to every entity (for tagging) or a "named relation" in the temporal case – Lorah Attkins Dec 20 '22 at 16:07
  • Please lose the attitude and all caps. My use case is exactly what I mention in the question – Lorah Attkins Dec 20 '22 at 16:19
  • 1
    I'm not sure that your use case is as well-described as you think, @LorahAttkins. While I recognize the frustration of explaining things that you thought clear, I'd say the questions being asked are reasonable. I have similar questions if I'm honest. We're all here trying to be helpful. – sehe Dec 20 '22 at 16:27
  • Sorry if my attitude offends you. I am trying to be as helpful as possible while staying within the word limit of these comments. I have edited my answer so as to make the same points in, hopefully, a more gentle fashion. Sometimes you need to accept help gracefully, no matter how rushed the helper might be. – ravenspoint Dec 20 '22 at 17:14
1

I would indeed reckon the data structure you are looking for is a versioning tree. E.g. one where nodes can be shared COW.

I'm not aware of things like this in the boost library.

But you can create your own data structures that have these properties. I have created one before for this answer: Transforming trees in C++ (look for CowTree).

Now, I'd be happy to draft something. However, your requirements are essential to the choices.

  • What trees would you need to be able to represent? I mean, are family trees expected to be strict DAG? That should be a safe assumption for biological family relations, but as soon as you include non-biological relations you could require more flexibility.

  • What kind of mutations exist to go from Treeᵥ₁ to Treeᵥ₂? If they are additive (again, as in history of biological relations) then you might be able to leverage graph adaptors:

    • subgraph, where previous versions are subgraphs of later versions (the overhead would grow linear with the number of versions)
    • filtered_graph, where you could use a "version timestamp" value with each node in the tree; your graph at a version (t) could then a view of the most recent version filtered for timestamp <= t. The big benefit would be that the overhead is constant regardless of the number of versions.

I have many examples of using either adaptor in existing StackOverflow answers, in case you want to get more inspiration.

I'll mull the options while making dinner. If I think of useful demo of either approach I'll add it tonight.

Added

I just thought of the idea to represent a graph as an append-only edge list; this would be a good match if the relations need to be versioned over time, relations never (or rarely) disappear and the contents of vertices themselves need not be versioned.

As you can see, for the example code this might be an excellent match, depending perhaps on the efficiency requirements on certain algorithms. To be future-requirement safe I wouldn't jump to this appraoch, but it may have the most merit until now.

sehe
  • 374,641
  • 47
  • 450
  • 633