4

I'm struggling with a problem in which I have to find a way to store all the possible paths a program might follow. Take this image as an example. enter image description here

In that image each number represents a complex process that might call other processes to execute itself, describing all the paths you can see in the image.

All the continuous lines represent paths that the process necessarily has to follow, while the dashed lines represent paths that are optional.

Knowing the execution starts from left to right and from top to bottom, one must always keep in mind that if a branch has already been built it must be reused and never rebuilt again.

In this another image for example the yellow line represents all the paths that were follow during the execution of process number 37.

enter image description here

In it you can see that the path starting with the process 18 (18->17->16) was previously built and so when process 19 is reached it shouldn't be rebuilt as all these processes take quite some time and it'd be a waste of time to try to build them again already knowing the results they produced. Instead, if a certain number in found to have been built previously (e.g process 18) it should be copy/attached to the process that called it (process 19 in the image). All this is for a log in which I have to store all the paths complete, that's why I mention the part of copying/reusing branches as I'll later have to query that log to display all these paths.

To execute all these process currently a recursive process is used, but since it doesn't consider that it can reuse paths that were previously build, the whole thing takes ages.

Do you know of any data structure that might help me optimize this process so that if a process was already executed it is only reused. As for the log, as I mentioned above I need to store the complete paths.

Any ideas or resources would be highly appreciated.

Thanks

Edit 1

----------------------------

One thing that perhaps I didn't make very clear, is that the data struture I need to create has two purposes:

  1. To keep track of all the paths that the main process (37 in the example) followed during its execution,giving me the chance to tell at any time if a certain path was already followed, being able to then copy that path to the node that was supposed to called it (in the example, copying the whole branch: 18->17->16 to process 19.
  2. By, giving me the opportunity to tell whether a path is already in this data structure or not, I can avoid executing subprocesses that were already executed and which results are known and with that, optimizing the whole proccess of execution.

Edit 2

----------------------------

Regarding the question about why I didn't consider using a Dictionary, well I had that idea at first, but then I couldn't find a way that I dictionary could tell me ,for example, that the path starting with 18 (18->17->16) descends from both process 37 and 19. You see, a node can have one or more parents. How could I express that with a Dictionary?

eddy
  • 4,373
  • 16
  • 60
  • 94
  • 1
    How is this not trivially a database table with a Primary Key? – Pieter Geerkens Jan 08 '16 at 13:09
  • 1
    You did not specify, why you cant hold results in HashMap (dictionary)...? – libik Jan 08 '16 at 13:12
  • I would use a tree (with process n°37 being the root) and do a quick check when inserting a new node (i.e process n°18) that it is not already in the tree that i'm building... – guillaume guerin Jan 08 '16 at 13:15
  • @PieterGeerkens because all those processes use a compiler (in C#) to evaluate expressions and only if I find a number I need to query my db to retrive data required to execute the process identified by that number, Otherwise, there's no need to query the db. – eddy Jan 08 '16 at 13:39
  • Possible duplicate of [Tree data structure in C#](http://stackoverflow.com/questions/66893/tree-data-structure-in-c-sharp) – Liam Jan 08 '16 at 13:49
  • @guillaumeguerin Could you give me more details? I have used trees before. – eddy Jan 08 '16 at 14:40
  • @libik Could you please take a look at **Edit 2** in my question? Thank you. – eddy Jan 08 '16 at 15:03
  • Instead of searching for complex data structures, wouldn't just a simple List array would do that keeps the possible paths to ancestors? – Cetin Basoz Jan 08 '16 at 15:07
  • But would a simple List<> allow me to quickly search for a certain key? For that reason I even consider using a Dictionary, but then I realise that I may have duplicate items since a item can be related to multiple items. (e.g Process 18) – eddy Jan 08 '16 at 15:10
  • As far as I can see your data structure is a Directed Acyclic Graph (DAG). When you traverse a node you evaluate some (complex) property and you want to cache this evaluation for future traversals of the DAG. If that is a correct explanation you might look into general DAG implementations in C#. – Martin Liversage Jan 08 '16 at 15:48
  • @MartinLiversage Will DAG let me retrieve all the children of a node recursively? – Axel Jan 08 '16 at 20:58

3 Answers3

4

I believe this is the data structure that you're looking for:

var paths = new Dictionary<int, HashSet<int>>()
{
    { 37, new HashSet<int>() { 18, 33, 34, 35, 36, } },
    { 18, new HashSet<int>() { 17, } },
    { 33, new HashSet<int>() { } },
    { 34, new HashSet<int>() { 19, 17, 15, } },
    { 35, new HashSet<int>() { 17, } },
    { 36, new HashSet<int>() { } },
    { 17, new HashSet<int>() { 16, } },
    { 19, new HashSet<int>() { 12, 18, } },
    { 15, new HashSet<int>() { 14, } },
    { 16, new HashSet<int>() { } },
    { 12, new HashSet<int>() { 11, } },
    { 14, new HashSet<int>() { } },
    { 11, new HashSet<int>() { } },
};

Here's the code to add a path to the paths:

public bool TryAddPath(Dictionary<int, HashSet<int>> paths, int x, int y)
{
    if (!paths.ContainsKey(x))
    {
        paths[x] = new HashSet<int>() { };
    }

    if (!paths[x].Contains(y))
    {
        paths[x].Add(y);
        if (!paths.ContainsKey(y))
        {
            paths[y] = new HashSet<int>() { };
        }
        return true;
    }
    return false;
}

The data structure above can be built by:

var paths = new Dictionary<int, HashSet<int>>();
var results = new bool[]
{
    TryAddPath(paths, 37, 18),
    TryAddPath(paths, 37, 33),
    TryAddPath(paths, 37, 34),
    TryAddPath(paths, 37, 35),
    TryAddPath(paths, 37, 36),
    TryAddPath(paths, 18, 17),
    TryAddPath(paths, 17, 16),
    TryAddPath(paths, 34, 19),
    TryAddPath(paths, 34, 17),
    TryAddPath(paths, 34, 15),
    TryAddPath(paths, 19, 12),
    TryAddPath(paths, 19, 18),
    TryAddPath(paths, 12, 11),
    TryAddPath(paths, 18, 17),
    TryAddPath(paths, 17, 16),
    TryAddPath(paths, 17, 16),
    TryAddPath(paths, 15, 14),
    TryAddPath(paths, 35, 17),
    TryAddPath(paths, 17, 16),
};

This returns the array { true, true, true, true, true, true, true, true, true, true, true, true, true, false, false, false, true, true, false, } which shows the paths that didn't need to be processed.

To then get a way to backtrack up the list do this:

ILookup<int?, int> parents =
    paths
        .Keys
        .AsEnumerable()
        .SelectMany(
            k => paths[k].Select(x => (int?)x).DefaultIfEmpty(),
            (k, v) => new { k, v })
        .ToLookup(x => x.v, x => x.k);

Now I can ask parents[17] and I get { 18, 34, 35, } returned. I can even do parents[null] and I get back { 33, 36, 16, 14, 11, } which shows the nodes that are leaves.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • Enigmativity Thank you. Do you think you could take a look at **Edit 2** in my question?. Once again, thank you. I really appreciate you take the time to help me. – eddy Jan 08 '16 at 15:06
  • @eddy - I've edited my answer to show how to traverse for the parents. – Enigmativity Jan 08 '16 at 15:26
  • If I'm not mistaken, you take another approach to solve the problem. Instead of trying to create a parent-child relationship, you store all the possible `connetions` between two nodes and then query all those conections to retrieve the actual paths that were follow. Is that right? – eddy Jan 08 '16 at 15:40
  • 2
    @eddy - well, you could probably look at this like a parent-children relationship, but it's really just a **n-ary tree**. I'm then just creating a reversed tree (i.e. `ILookup parents`) by querying my `paths` object to go back up the tree. – Enigmativity Jan 08 '16 at 15:43
  • 1
    Enigmativity, I really like your solution because from what I can understand it isn't necesary to copy elements from one node to another. Instead, if a process is related to another it just takes to add another item composed by that process and the one that is related to. – eddy Jan 08 '16 at 15:54
  • @Enigmativity the AddPath method can be used two both check if an item exists and also if it doesn't, add it to the dictionary. Is that correct? – Axel Jan 08 '16 at 16:05
  • @Axel - `AddPath` can only be used to attempt to add a path. If it adds it it returns `true` and if not `false`. You could directly check `paths` to see if a node exists. – Enigmativity Jan 08 '16 at 16:13
  • @eddy - I take it back - this isn't an **n-ary tree** - this is a **directed acyclic graph**. You could look that one up on Google for more detail. – Enigmativity Jan 08 '16 at 16:18
  • @Enigmativity By using `ContainsKey`, I guess? One thing I don't understand is why when I ask `parents[17]` it returns `{18,34,35,0}` why zero is inclueded? – Axel Jan 08 '16 at 16:31
  • @Enigmativity Just like you traversed for parents, is it possible to traverse for children? – Axel Jan 08 '16 at 21:23
  • @Axel - When you call `parents[17]` the `0` **isn't** included. Only three numbers are returned: `{ 18, 34, 35, }`. – Enigmativity Jan 09 '16 at 00:39
  • @Axel - Traversing the children is just `paths[key]`. Just the same as `parents[value]` really. – Enigmativity Jan 09 '16 at 00:40
  • @Axel - I've changed `AddPath` to `TryAddPath` as I think that is a more meaningful name. – Enigmativity Jan 09 '16 at 00:44
  • @Enigmativity , I think Axel is right. When I use your code, `parents[17]` shows a collection of 3 elements but when you see the collection it actually has an extra item with a value of zero. ![screen capture](http://s23.postimg.org/3yvypwx17/Extra.jpg). – eddy Jan 09 '16 at 12:51
  • @Enigmativity Actually when I convert the value returned by parents[17] to a List , I get only the values that should be returned. The zero elements is removed. – eddy Jan 09 '16 at 13:04
  • @eddy - There isn't a zero element in the first place. – Enigmativity Jan 09 '16 at 13:53
  • @Enigmativity But there is a zero element. Take a look at this capture http://s23.postimg.org/3yvypwx17/Extra.jpg – eddy Jan 09 '16 at 14:12
  • @eddy - That's just an implementation detail for the `.ToLookup(...)` operator. It appears to be re-allocating arrays by factors of two. So to hold 3 items it would have an array of 4 elements. The last element is just for future capacity if needed. If there were 5 elements then the array would be 8 long. If there where 9 it would be 16, etc. – Enigmativity Jan 09 '16 at 14:44
  • @Enigmativity I see. Guess that's just the normal behaviour of .`ToLookup` – eddy Jan 09 '16 at 15:43
  • @Enigmativity Your answer has saved me a lot of time. Just one last question. Base on the example 4 of the OP, will this implementation let me get an array of array like this : `0: 37` | `1 : 18 , 34` | `2: 17,19,15` | `3 : 16,18,14` | `4 : 17` | `5: 16` . Here the number before the `:` character represents the position in the array. This would help me a lot, because otherwise, I can't see how will I be able to store something like this in a db. – Axel Jan 09 '16 at 15:56
  • @Enigmativity Thank you very much. You've been very kind and helpful. Not only you gave me a ready-to-use solution, but you also took the time to answer my questions. I really apreciate that. Once again, thank you. – eddy Jan 09 '16 at 16:00
  • @Axel - Why do you need to store indices? You only need to store a list of pairs and then call `TryAddPath` to rebuild the data structure. – Enigmativity Jan 10 '16 at 00:59
  • @Enigmativity Because once I got all the possible paths I need to store them in a db, so that I can later retrieve that data and display it using D3.js – Axel Jan 10 '16 at 01:29
  • @Axel - yes, but you don't need to store `0 : 37` | `1 :18, 34` etc. You just need to store `37, 18`, `37, 33`, ... , `17, 16`. Just a series of paths is enough. Right? – Enigmativity Jan 10 '16 at 01:36
  • @Enigmativity Yes, with that I will be able to reconstruct the `Dictionary`, but I guess what I'm struggling to understand here is how I can go from that series of paths to the tree-like diagram the OP show in his second image (the paths in yellow) – Axel Jan 10 '16 at 01:49
  • @Axel - I don't understand why you want to do that? – Enigmativity Jan 10 '16 at 02:11
  • @Enigmativity It's just that I need to find a way to store all the paths to a database so that later I can retrieve that data and build diagrams that make it easy for users to understand all those paths. The problem is that in the past when I have created tree diagrams I always need it some sort of parent-child relationship and I don't know how to use the series of paths for that purpose :( – Axel Jan 10 '16 at 13:50
  • @Axel - The data structure is pretty much a parent-child relationship. I don't understand what your struggle is. You can pull out the pairs with a basic `SelectMany` query - `from p in paths from v in p.Value select new { p.Key, v }`. – Enigmativity Jan 10 '16 at 14:02
  • @Enigmativity I guess that if I need to reconstruct the whole path for 37, all I need is to retrieve its children, 18 and 34, and then recursively retrieve the children for each of them up until I reach the leaves, is that right? – Axel Jan 10 '16 at 14:13
  • @Axel - Yes, but start with `parents[null]` and recursively work upwards. That would be the easy way to reconstruct each full path. – Enigmativity Jan 10 '16 at 14:20
1

Use two data structures. It seems that you're already using a tree that you traverse to find the next node that you need to process; that's fine, keep that tree. But also create a new data structure such as a Dictionary to keep track of the already-processed nodes, allowing you to quickly look up a node by its ID and see the results of previous computations of it.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • Please see the **edit** in my question. It gives more details of what I'm after. Thanks – eddy Jan 08 '16 at 14:09
  • @eddy I stand by the answer, after reading the edit. You want a tree (apparently starting out incomplete?) with a separate lookup of already-computed results. – Servy Jan 08 '16 at 14:11
  • Servy, currently I'm not using a tree to find the next node to process. You see, every process is formed of several expressions (e.g. `If ([AGE]>20) then PRO#18#RET1`) in which you can find a reference to another process identified by its number(18) and follwed by the result of that process that is going to be used (RET1).It's only when I find these kind of expressions that I know I have to call another process. – eddy Jan 08 '16 at 14:19
  • @eddy That sounds like you functionally have a tree, although given that you've shown virtually nothing of what you're actually doing, I of course can't tell for sure. If you're doing something completely different then you need to explain that in your question. But again, it seems that all you need to add is a dictionary mapping ids to their computed results. – Servy Jan 08 '16 at 14:23
  • Do you think it'd be possible to create a tree and pass it by reference to the recursive method I use to execute the processes? And then when the recursive method is done I have the tree with all the nodes that describe acurately all the paths that were followed. Of course during the execution I have to be able to query the tree and also to copy branches from one node to another like the case of the branch `18->17->16` that is needed to be copy from node **37** to node **19**. Will I be able to do all that with a single tree? – eddy Jan 08 '16 at 14:31
  • @eddy Yes, you could. – Servy Jan 08 '16 at 14:36
  • Servy, sorry to keep bothering you, but could you please take a look at **Edit 2** in my question? – eddy Jan 08 '16 at 15:02
0

To eloborate on my comment of simply using a List here is what I meant (partial):

void Main()
{
  var data = new Dictionary<int, List<int[]>>();

  data.Add(16, new List<int[]> {
    new int[]{ 37, 18, 17, 16 },
    new int[]{ 37, 34, 19, 18, 17, 16 },
    new int[]{ 37, 34, 17, 16 },
    new int[]{ 37, 35, 17, 16 }});

  data.Add(17, new List<int[]> {
    new int[]{ 37, 18, 17 },
    new int[]{ 37, 34, 17 },
    new int[]{ 37, 35, 17 }});


  data.Add(18, new List<int[]> {
    new int[]{ 37, 18 },
    new int[]{ 37, 34, 19, 18 } });

  var node35_IsAncestorOf = data
    .Where(d => d.Value.Any(v => v.Contains(35)))
    .Select( d => d.Key);   

  node35_IsAncestorOf.Dump();    // LinqPad
}

Also this structure is like a directory structure where 16, 17 etc are NodeText and not NodeId so a tree structure would work (like a filename could exist with the same name under different directories).

PS: At database level, this looks like a graph database like Neo4j, OrientDb, VelocityGraph ... Most of them are opensource if you want to check their data structures.

Cetin Basoz
  • 22,495
  • 3
  • 31
  • 39