1

Consider this conceptual diagram below, which is for demonstrational purposes only.

   Abc       Foo
     \       /   \
      \     /     Foo2
        Bar        \
      /    \       Foo3
     Bar2   Bar3     \
     /   \           Foo4
     X    Y

In the above tree, there is a unique "path", Foo->Bar->Bar2->X. This path is distinct from Abc->Bar->Bar2->X .. Obviously this information is lost in the above represenation, but consider I have all the individual unique paths stored.

They do however, share some part of the path "Bar->Bar2->X".

The purpose of the algorithm I'm trying to either find or implement is that I would like to aggregate this information so I can not store individual paths. But more importantly, I'm trying to find all these common paths, and given them weights. So for example, in the above case, I could condense the information about the "Bar->Bar2->X" and say it happened 2 times. Obviously I'd require it to work for all cases.

And yes, the ultimate idea is to be able to quickly ask the question "Show me all the distinct paths from Foo". In this example there is only 1, Foo->Bar->Bar2->X. Foo->Bar->Bar2->Y and Foo->Bar->Bar3 do not exist. The diagram is for viewing purposes only.

Any ideas?

halivingston
  • 3,727
  • 4
  • 29
  • 42

2 Answers2

1

So this is just a starting point I hope others will help me fill in but I would think about them as Strings and look at the common sub paths as the common substring problem which has been looked at quite a bit in the past. Off the top of my head I might invert each path/string and then build a trie structure from those because then by counting the number of keys below a given node you can see how many times that ending path gets used... There is probably a better and more efficient way but that should work. Anyone else have ideas on treating them as strings?

hackartist
  • 5,172
  • 4
  • 33
  • 48
0

You could store each unique path separately. To answer questions such as “who does Foo call”, you could create an index in the form of a hash table.

As an alternative, you could try using DAWG, but I'm not sure how much would it help in your case.

svick
  • 236,525
  • 50
  • 385
  • 514