1

I did a search on similar topics, but the answers are too vague for my level of understanding and comprehension, and I don't think they're specific enough to my question.

Similar threads:
Tree (directed acyclic graph) implementation
Representing a DAG (directed acyclic graph)

Question:

I have formatted a text file which contains data of the following format...
Example dataset:
GO:0000109#is_a: GO:0000110#is_a: GO:0000111#is_a: GO:0000112#is_a: GO:0000113#is_a: GO:0070312#is_a: GO:0070522#is_a: GO:0070912#is_a: GO:0070913#is_a: GO:0071942#part_of: GO:0008622
GO:0000112#part_of: GO:0000442
GO:0000118#is_a: GO:0016581#is_a: GO:0034967#is_a: GO:0070210#is_a: GO:0070211#is_a: GO:0070822#is_a: GO:0070823#is_a: GO:0070824
GO:0000120#is_a: GO:0000500#is_a: GO:0005668#is_a: GO:0070860
GO:0000123#is_a: GO:0005671#is_a: GO:0043189#is_a: GO:0070461#is_a: GO:0070775#is_a: GO:0072487
GO:0000126#is_a: GO:0034732#is_a: GO:0034733
GO:0000127#part_of: GO:0034734#part_of: GO:0034735
GO:0000133#is_a: GO:0031560#is_a: GO:0031561#is_a: GO:0031562#is_a: GO:0031563#part_of: GO:0031500
GO:0000137#part_of: GO:0000136

I'm looking to construct a weighted directed DAG from this data (the above is just a snippet). The whole dataset of 106kb is here: Source

--------------------------------------------------

Taking into consideration line-by-line, the data of each line is explained as follows...
First line as an example:
GO:0000109#is_a: GO:0000110#is_a: GO:0000111#is_a: GO:0000112#is_a: GO:0000113#is_a: GO:0070312#is_a: GO:0070522#is_a: GO:0070912#is_a: GO:0070913#is_a: GO:0071942#part_of: GO:0008622

'#' is the delimeter/tokenizer for the line data.
The First term, GO:0000109 is the node name.
The subsequent terms of is_a: GO:xxxxxxx OR part_of: GO:xxxxxxx are the nodes which are connected to GO:0000109.
Some of the subsequent terms have connections too, as depicted in the dataset.
When it is is_a, the weight of the edge is 0.8.
When it is part_of, the weight of the edge is 0.6.

--------------------------------------------------

I have Google-d on how DAGs are, and I understand the concept. However, I still have no idea how to put it into code. I'm using Java.
From my understanding, a graph generally consists of nodes and arcs. Does this graph require an adjacency list to determine the direction of the connection? If so, I'm not sure how to combine the graph and adjacency list to communicate with each other. After constructing the graph, my secondary goal is to find out the degree of each node from the root node. There is a root node in the dataset.

For illustration, I have drawn out a sample of the connection of the first line of data below:
Image Link

I hope you guys understand what I'm trying to achieve here. Thanks for looking through my problem. :)

Community
  • 1
  • 1

2 Answers2

1

Because it's easier to think about, I'd prefer to represent it as a tree. (Also makes it easier to traverse the map and keep intermediate degrees.)

You could have a Node class, which would have a Collection of child Node objects. If you must, you could also represent the child relationships as a Relationship object, which would have both a weight and a Node pointer, and you could store a Collection of Relationship objects.

Then you could do a walk on the tree starting from the root, and mark each visited node with its degree.

class Node{
    String name;
    List<Relationship> children;
}

class Relationship{
    Node child;
    double weight;
}

class Tree{
    Node root;
}

Here, Tree should probably have a method like this:

public Node findNodeByName(String name);

And Node should probably have a method like this:

public void addChild(Node n, double weight);

Then, as you parse each line, you call Tree.findNodeByName() to find the matching node (and create one if none exists... but that shouldn't happen, if your data is good), and append the subsequent items on the line to that node.

As you've pointed out, DAGs cannot really be converted to trees, especially because some nodes have multiple parents. What you can do is insert the same node as the child of multiple parents, perhaps using a hash table to decide if a particular node has been traversed or not.

  • could you, perhaps, help me with your suggestion by providing some sort of pseudocode of your implementation? :/ – bumper_boy2000 Dec 15 '11 at 08:47
  • ok... taking into consideration a node is called VNode. how about the arcs? – bumper_boy2000 Dec 15 '11 at 09:01
  • oh ok i missed the part of addchild. so in fact the arcs don't need to be made as a class/object? – bumper_boy2000 Dec 15 '11 at 09:13
  • The `Relationship` class would represent the edges. –  Dec 15 '11 at 23:58
  • ok i'm stumped on how Node has a list of Relationships, while a Relationship has a VNode as a property. ahhhhhh! i can't imagine how to construct, though I can figure out how to query from the graph if it were completed. :/ – bumper_boy2000 Dec 19 '11 at 09:15
  • mulitple parents, multiple children. :( – bumper_boy2000 Dec 19 '11 at 15:36
  • Not multiple parents - each Node will have only one parent (exception: the root will not have a parent) but may have any number of children. Also, each Node is aware only of its children and oblivious of its parent. – Emil Lundberg Dec 20 '11 at 00:25
0

Reading the comments, you seem confused by how a Node can contain Relationships which each in turn contains a Node. This is quite a common strategy, it is in general called the Composite pattern.

The idea in the case of trees is that the tree can be thought of as consisting of multiple subtrees - if you were to disconnect a node and all its ancestors from the tree, the disconnected nodes would still make a tree, though a smaller one. Thus, a natural way to represent a tree is to have each Node contain other Nodes as children. This approach lets you do many things recursively, which in the case of trees is often, again, natural.

Letting a Node keep track of its children and no other parts of the tree also emulates the mathematical directed graph - each vertex is "aware" only of its edges and nothing else.

Example recursive tree implementation

For instance, to search for an element in a binary search tree, you would call the root's search method. The root then checks whether the sought element is equal, less or greater than itself. If it is equal, the search exits with an appropriate return value. If it is less or greater, the root would instead call search on the left or right child, respectively, and they would do exactly the same thing.

Analogously, to add a new Node to the tree, you would call the root's add method with the new node as a parameter. The root decides whether it should adopt the new node or pass it on to one of its children. In the latter case, it would select a child and call its add method with the new Node as a parameter.

Emil Lundberg
  • 7,268
  • 6
  • 37
  • 53
  • ah. would it be right to say that in order to construct the tree, the only way is to recursively construct from the leaf? – bumper_boy2000 Dec 20 '11 at 06:35
  • It's not the only way, but definitely a common one, many programmers find recursive algorithms appealingly elegant. However, I think it's easier to build the tree from the root rather than the leaves. See my edited answer. – Emil Lundberg Dec 20 '11 at 10:38
  • alright. recursive method design is one of my weaknesses, but i'll work on it as good training! :D thanks Emil! – bumper_boy2000 Dec 20 '11 at 15:07