6

I'm loading horse genealogical data recursively. For some wrong sets of data my recursion never stops... and that is because there are cycles in the data.

How can I detect those cycles to stop recurring?

I thought of while recurring maintain a hashTable with all "visited" horses. But that will find some false positives, because a horse CAN be twice in a tree.

What cannot happen is that a horse appear as a father or grandfather or great grandfather of ITSELF.

Romias
  • 13,783
  • 7
  • 56
  • 85
  • 1
    There is no such thing as a cycle in a binary tree. Even correct genealogical data would not be a binary tree. I edited the question to read "graph" – Kenan Banks Aug 06 '09 at 03:54
  • curious, is this for figuring out the Dosage Index, Werk Nick Rating or something similar (as in thoroughbreds)? – nlucaroni Aug 06 '09 at 15:57
  • Nope... I'm exporting the entire horse pedigree to a file. I'll use it also for detecting inbreeding but as my software product is not breed specific I have no much historical data to analyse. – Romias Aug 15 '09 at 17:53

6 Answers6

6

Pseudo code:

void ProcessTree(GenTreeNode currentNode, Stack<GenTreeNode> seen)
{
   if(seen.Contains(currentNode)) return;
   // Or, do whatever needs to be done when a cycle is detected

   ProcessHorse(currentNode.Horse); // Or whatever processing you need

   seen.Push(currentNode);

   foreach(GenTreeNode childNode in currentNode.Nodes)
   {
      ProcessTree(childNode, seen);
   }

   seen.Pop();
}

The basic idea is to keep a list of all of the nodes that we've already seen on our way down to the current node; if was get back to a node that we already went through, then you know that we've formed a cycle (and we should skip the value, or do whatever needs to be done)

Daniel LeCheminant
  • 50,583
  • 16
  • 120
  • 115
  • It seems to work... debuged with my finger :) I'll give it a try and think it some border case is missing. I'll let you know :) – Romias Mar 07 '09 at 03:47
  • Well... this worked like a charm. Anyway my test genealogical data was really a piece of junk but at least even in those awful cases my software tolerate it. I also added another stop conditions related with the lenght of the stack... to set a maximum depth in the tree. Thanks! – Romias Mar 07 '09 at 14:42
2

Maintain a stack of all elements leading up to the root of the tree.

Every time you advance down the tree, scan the stack for the child element. If you find a match, then you've discovered a loop and should skip that child. Otherwise, push the child onto the stack and proceed. Whenever you backtrack up the tree, pop an element out of the stack and discard.

(In the case of genealogical data, a "child" node in the tree is presumably the biological parent of the "parent" node.)

Matthew
  • 2,024
  • 15
  • 19
2

This sounds like a case where you can finally apply that interview trivia question: find a cycle in a linked list using only O(1) memory.

In this case your "linked list" is the sequence of elements you enumerate. Use two enumerators, run one at half speed, and if the fast one ever runs into the slow one then you have a loop. This will also be O(n) time instead of the O(n^2) time required for checking a 'seen' list. The downside is you only find out about the loop after some of the nodes have been processed multiple times.

In the example I've replaced the 'half speed' method with the simpler-to-write 'drop markers' method.

class GenTreeNode {
    ...

    ///<summary>Wraps an the enumeration of linked data structures such as trees and linked lists with a check for cycles.</summary>
    private static IEnumerable<T> CheckedEnumerable<T>(IEnumerable<T> sub_enumerable) {
        long cur_track_count = 0;
        long high_track_count = 1;
        T post = default(T);
        foreach (var e in sub_enumerable) {
            yield return e;
            if (++cur_track_count >= high_track_count) {
                post = e;
                high_track_count *= 2;
                cur_track_count = 0;
            } else if (object.ReferenceEquals(e, post)) {
                throw new Exception("Infinite Loop");
            }
        }
    }

    ...

    ///<summary>Enumerates the tree's nodes, assuming no cycles</summary>
    private IEnumerable<GenTreeNode> tree_nodes_unchecked() {
        yield return this;
        foreach (var child in this.nodes)
            foreach (var e in child.tree_nodes_unchecked())
                yield return e;
    }
    ///<summary>Enumerates the tree's nodes, checking for cycles</summary>
    public IEnumerable<GenTreeNode> tree_nodes()
    {
        return CheckedEnumerable(tree_nodes_unchecked());
    }

    ...

    void ProcessTree() {
        foreach (var node in tree_nodes())
            proceess(node);
    }
}
Craig Gidney
  • 17,763
  • 5
  • 68
  • 136
  • For now... the solution of the "seen" list is working. I realize that your approach should be more efficient. I have some trees to serach from Father to Childs and may be I can use your approach. Thanks any way. – Romias Mar 15 '09 at 16:23
1

A very simple way to detect this, is by checking that constraint itself:

What cannot happend is that a horse appear as a father or grandfather or greatgrandfather of ITSELF.

Whenever you insert a node in your tree, traverse the tree to the root to make sure that the horse does not exist as a any kind of parent.

To speed this up, you can associate a hashtable to each node, where you cache the answer of such a lookup. Then you don't have to search the entire path next time you insert a horse under that node.

Zuu
  • 1,123
  • 5
  • 11
0

Your hash table solution should work if you keep track of nodes instead of horses. Just make sure every time you read a new horse you create a new node even if the value/horse is the same as a previous node's value/horse.

C. Dragon 76
  • 9,882
  • 9
  • 34
  • 41
0

You're dealing with a directed acyclic graph, not a tree. There should not be any cycles, as a horse's descendant cannot also be its ancestor.

Knowing this, you should apply code techniques specific to directed acyclic graphs.

user59200
  • 384
  • 2
  • 2
  • When you have a horse, you always get a binary tree because a horse just have 2 parents. When this is not well formed sometimes you get a cycle. – Romias Mar 07 '09 at 03:41
  • Inbreeding causes this to be a DAG, not a binary tree. Like, Court Vision (http://www.pedigreequery.com/court+vision) having Native Dancer 4x5 and Nasrullah 5x5. Although presented here as a binary tree, it's really a DAG. – nlucaroni Aug 06 '09 at 16:11
  • Yeah... you are right... I was not considering the case of inbreeding as a being the SAME horse, just their position in the pedigree. – Romias Aug 15 '09 at 17:35