4

I am new to Tree structures. I am trying to design an algorithm which takes as input a predetermined tree. It should then seed the root of the tree with an object of my own design (I don't think it's relevant to the design of the algorithm, but the object is an ArrayList), and uses a "change operator" to modify the object along each branch of the tree. The number of passes of the change operator is dependent upon the branch length. The object must be evaluated at each node of the tree, and this node value is then used to evaluate the values of its child-nodes. Therefore, I cannot, simply evaluate the value at each tip using the total length from root to tip, as the change operator I am using is stochastic, rather than deterministic. Therefore the value of each child node is dependent upon the value of its parent node.

I have attempted to design this myself. I started by creating an array of times at which a single branch would split into its children.

int[] times = new int[branchnumber];
    times[0] = 1;
    times[1] = 5;

Then, I created a method which took the times at which branching should occur (which I interpreted as a branch length), the ArrayList object, the current time, and the total number of branches.

public static void Brancher(int[] times, List<double[][]> sequences, int t){
    boolean checker = false;
    for (int i = 0; i < times.length; i++) {
        if (times[i] == t) {
            checker = true;
        }
    }
    if (checker == true) {
        double[][] seq = sequences.get(sequences.size() - 1);
        sequences.add(seq);
    }
}

The Brancher method is implemented as below:

for (int j = 0; j <= loopnumber; j++) {
        MathsOperators.Brancher(times, sequences, t);

        for (int i = 0; i < sequences.size(); i++) {
            double[][] sequenceholder = sequences.get(i);
            MathsOperators.PrintOutput(sequenceholder, t);
            ComplexInput.evolvesequence(sequenceholder, frequencies, transitionrate, transversionrate, random);
            sequences.set(i, sequenceholder);
        }

        t = t + dt;
    }

So, I hold the current state of all objects in the tree as Arrays held in the sequences ArrayList. These objects are then acted upon by the evolve method, and the updated objects replace the originals in the ArrayList. When a branch is to be added, the Brancher method takes the last Array object in the ArrayList, copies it, and adds the copy to the list. This has in effect simulated the splitting of the last object into two objects. The copy is then updated and evolved in the next iteration of the simulation loop.

This method of doing things, while not pretty, results in trees that look like this: http://content.science20.com/files/Tree3A.jpg . This is a relatively uncomplicated structure.

However, it is impossible to create a tree of this shape: http://content.science20.com/files/Tree4.jpg . This tree has multiple splits to the far right. I don't know the exact terminology to describe the differences between the trees here, but its reasonably obvious.

I think I am confusing myself here. Any advice about how to think about this problem would be greatly appreciated.

(If it helps with context, the input object is a genetic sequence (ACGT). The idea is to evolve the sequence along each branch of the tree, with each tip representing the descendant of the original ancestor (input) sequence)

EDIT

The input object is an ArrayList, containing multiple (n x 2) Arrays of doubles. The first column of each array contains n integers from the set {1,2,3,4}, with frequencies I control. However, the order of these characters in the column is random. Each integer represents one of the characters A,C,G,T in a DNA sequence. The second column contains a double representing the rate of evolution for each DNA site, so one rate entry for each integer entry. The initial Array is generated using a function I call DNASEQ, it's very long and doesn't make a difference to this question.

To start, I generate one of these Arrays, and add it to an ArrayList called sequences. Using the simulation loop shown above, I then apply the evolvesequence method to each Array in the ArrayList.

When Brancher detects that the time, which is updated after every loop in increments of 1, is equal to one of the times of branching specified, then it takes the last Array in the ArrayList and adds a copy of the array to the Arraylist. If the simulation stopped there, the Array would be identical to the one it was copied from. However, now that it is in the ArrayList, it is subject to the evolvesequence method. Since evolvesequence is a stochastic/probabilistic method, then for a given input, no two outputs are guaranteed to be identical. So, after a few iterations of the simulation loop, the copied Array will be very different from the original.

The original sequence is displayed, and then copied by Brancher. Now I have two sequences of identical origin, evolving independently. This is the same as a branch of a tree.

  • I don't really get what you're trying to do here, but wouldn't a real tree data structure make more sense, see e.g. here: http://stackoverflow.com/questions/3522454/java-tree-data-structure? I don't really understand you code, e.g., incrementing branchcount seems to have no effect...? Perhaps you might want to provide more code? – stef77 Jul 30 '15 at 16:33
  • Maybe I didn't explain this properly. The tree is supposed to serve as a guide or pathway, down which the "evolution operator" runs. It takes the value of the object at the root node, evolves it down both branches, and evaluates the modified object at the ends of the branches. Then, it should take the new values for each node, and follow whatever branches are specified below those nodes, and so on until it reaches the tips of the tree. My branching algorithm can only result in a tree of this shape, http://content.science20.com/files/Tree3A.jpg, as opposed to a more complex shape, with subtrees – Jake Watson Jul 30 '15 at 16:56
  • An example of what I mean by more complex: http://content.science20.com/files/Tree4.jpg. Sorry, I don't know the exact terminology of what I'm trying to express. – Jake Watson Jul 30 '15 at 17:01
  • I've tried to expand upon my original post, and added another code snippet. Does this help with your understanding of my aim? – Jake Watson Jul 30 '15 at 17:17
  • Can you perhaps give a small example of an input and the desired output? Doesn't have to include all data of the output, i think, perhaps you can simplify the output to be a simple number or string? And/or perhaps you can map your terminology to the Tree3A.jpg you provided, i.e., what are `times`, `branch`, `tip`, or `node` in this image? How is `sequences` populated initially? And: I would have to try it, but I'm pretty certain that by getting an array from the list and adding it again, you have the exact same object twice in your list, and not a copy of it - is this what you really want? – stef77 Jul 31 '15 at 06:54
  • Have a look at the latest edit, I've included a screenshot of the output. – Jake Watson Jul 31 '15 at 10:36
  • In the 3rd snippet of code: what's the initial value of **t** and **dt**? – rolgalan Jul 31 '15 at 11:23
  • So, if I understood right: each iteration you take a node and create two different branches. One of them will be retained becoming a leaf in the tree, and the other will be taken as the input node for the next iteration. The leaf branch will be evolvesequenced once and the other branch will be evolvesequence five times (according to *times* array). Is it right? – rolgalan Jul 31 '15 at 11:33
  • The initial value of t is zero, and dt is 1. That idea is mostly right, it does not create two different branches every iteration, only only the iterations where t is equal to one of the entries in the `times` array. Both branches are constantly evolved until the loop finishes, the `Brancher` method only creates more branches to be evolved. – Jake Watson Jul 31 '15 at 11:45
  • I've read this now over and over again - I'm sorry, but what is your question again? You have an algorithm, an output - what is wrong, or what is the exact problem? – stef77 Jul 31 '15 at 12:29
  • The problem, is that my method/algorithm can only produce trees of the shape shown in the first link. It cannot provide trees with multiple subtrees, this is a limitation of my Brancher method. I can't see, after some thinking, how to generalise it to produce trees of the shape shown in the second link. I might be thinking about the whole problem in a way which makes it more complex than it needs to be, so are there any other ways I could do this? – Jake Watson Jul 31 '15 at 14:06
  • Ok - that was totally not clear to me and still isn't from reading your question again. Should have realized sooner. Yes, it seems you're creating some sort of very complex problem. I come back to my first suggestion, use a well-defined tree, and tried to formulate this as an answer. – stef77 Jul 31 '15 at 15:37

1 Answers1

1

With an isolated view on your specific task, it may be beneficial to implement a very special algorithm for your very special tree.

However, as you can see in the comments, this makes it way harder for outsiders to understand your code, yet alone help you. Therefore, I suggest you use this structure: Java tree data-structure?.

The benefits are that this is a well-known structure and everybody with some knowledge about data structures should be able to help you. The structure is generic enough that it should be able to reflect all variants of trees you're trying to build.

Moreover, it seems that you're constructing the tree and at the same time you populate it with data, is this correct? This occurs to me just now. It doesn't look like these two steps are dependent on each other, so you could first create your tree and then traverse it in a second loop in order to populate it with data? If you can separate these two steps, your code will become much clearer and easier to understand.

If you just want a binary tree (i.e., a tree having at most two children at each node), you could even simplify the structure by not having an ArrayList<Node<T>> as children, but a leftChild and a rightChild.

Afterwards, you set the data of the root node to your initial double[][], then you traverse the tree in the order appropriate to your task and modify each node's data. From each node, you can access its children and its parent, i.e., from every node in the tree, you can access the complete information stored in the tree, which should provide you with all necessary information to update a node's data. Of course, you'd have to visualize the tree and know exactly on which node you are right now, what information you need and how you can access it (i.e., how to navigate to the information you need).

Community
  • 1
  • 1
stef77
  • 1,000
  • 5
  • 19