1

I'm looking for an inexact graph matching algorithm on graphs with labeled vertices and labeled, directed edges. My task is to detect changes to two graphs to display them to the developer (think subversion diff). I've already implemented an optimization algorithm based on tabu search (this), but I can't get the algorithm to consider my edge labels. My graphs have at most 120 vertices and 200 edges, so I might get away with a slower but simpler to implement algorithm.

Here is an example for your viewing pleasure, : Cfc Chart

Community
  • 1
  • 1
  • What do you mean by 'inexact matching'? Also, what are you trying to do - graph layout? Finally, how does the optimization algorithm fit in here? Perhaps you are matching to laid-out templates or something. – gilleain Jan 29 '16 at 21:16
  • By inexact matching I meant "error correcting" matching, because the second graph might be mutated (labels, inserted/deleted vertices, inserted/deleted edges). My task is to detect changes to these graphs to display them to the developer (think subversion diff). The optimization algorithm is my current algorithm, but I can't customize it to respect my edge labels and so in highly symetrical graphs the matchings are quite bad. – One Man Monkey Squad Jan 30 '16 at 08:48
  • By the way, what do you mean by "I can't get the algorithm to consider my edge labels"? The algorithm you mentioned appears to be able to deal with edge labels. – Goblin Alchemist Feb 01 '16 at 15:37
  • Parts of the text state so, but the text does not mention details. I tried 2 ways: Use the labels in the global Page Rank value and use the labels in the local similarity value. But I could not get it to work properly. I bet I overlooked something. – One Man Monkey Squad Feb 03 '16 at 16:51

1 Answers1

1

Since no one has proposed an existing algorithm, I'll try to invent one...

For each vertex, you can calculate its "signature" by concatenating its label with the labels of all adjacent edges. For consistency, sort the labels alphabetically. Since the edges are directed, concatenate incoming and outgoing edges separately.

These signatures can be used to detect changes in the set of vertices. First, find corresponding vertices with the same signature in the first and the second graph. Remaining unpaired vertices are added vertices, removed vertices, vertices with changed labels, vertices with changed edge connections, and vertices whose edges' labels were changed. You can associate them by comparing their signatures and selecting best matches using some string matching algorithm. Apparently you will have to introduce some critical degree of similarity to distinguish "it's the same vertex with many changed properties" from "it's a new vertex with some accidental signature similarity".

Arrange all vertices of the first graph in an array, in any order. Create another array of the same size. Put the matching vertices of the second graph into the second array at positions corresponding to the first array; do this for all exactly matched vertices and all modified vertices. For first graph vertices which don't have a match in the second graph (deleted vertices), leave the array cells empty. Then, for second graph vertices which don't have a match in the first graph (new vertices), add these vertices to the end of the second array and expand the first array with the corresponding number of empty cells.

Now, when vertices of a graph are listed in an array, the edges can be represented as a 2-dimensional array. If an edge is going from the i-th vertex to the j-th vertex, put its label into the (i,j) cell of the array.

Do this for both graphs. Since you have constructed two vertex arrays of the same size, you get two 2-dimensional arrays of the same size, with a one-to-one correspondence. Comparing these two arrays in a straightforward way allows you to detect added edges, removed edges and edges with changed labels.

Community
  • 1
  • 1
Goblin Alchemist
  • 829
  • 5
  • 11
  • This seems similar to my initial algorithm. The problem here is that it ignores the actual structure of the graph. In graphs with many similar nodes , mirrored segments, vertices without labels, ... this fails completely, which is not acceptable. – One Man Monkey Squad Feb 03 '16 at 16:54
  • My current algorithm for now (workaround): Initial assigment with nodes which are unique and equal. Each step, expand the matching from any already matched node.When there are no new matches during expansion, but unassigned vertices match the most promising ones and expand again. Repeat. The big no-no here is that the initial assigments must be 100% correct. Some in my graphs are (special vertices which always belong to each other). But when there are too much changes the algorithm quickly falls apart. – One Man Monkey Squad Feb 03 '16 at 16:57
  • 1
    @SirPolly, yes, my proposal assumed that most vertices and edges have labels and that most labels are unique. One possible idea is to include not only adjacent edge labels in the signature but also adjacent vertices, next-level edges etc. But in case of large repeated blocks this will also fail. A possible improvement to your expanding algorithm is to try choosing different starting matches (even with different labels) and then compare expansion results. Of course this will be much slower. Regarding tabu search, probably you could contact the article authors and ask them for omitted details. – Goblin Alchemist Feb 04 '16 at 13:22
  • http://www.vldb.org/pvldb/vol8/p1010-kazemi.pd - This algorithm is similar to my makeshift approch, but with proper research and though. – One Man Monkey Squad Feb 24 '16 at 16:23