0

I want to set the "edge.length" in a phylo object using a variable in a data.frame. The "node.label" "tip.label" in the phylo object corresponds to the rownames in the data.frame. How can edge.length be set using a variable in the data.frame while ensuring that the data is matched correctly? In the code below it is in step 3. I want the edge.length to be matched so that the node.label or tip.label matches row.name in the data.frame.

## R code:
## load ape
library(ape)
## 1. A phylo object:
library(data.tree)

A1  <- Node$new("A1")
B1  <- A1$AddChild("B1")
C1  <- B1$AddChild("C1")
D1  <- C1$AddChild("D1")
E1 <- C1$AddChild("E1")
F1 <- E1$AddChild("F1")
G1 <- E1$AddChild("G1")
H1 <- G1$AddChild("H1")
A1.phylo <- as.phylo.Node(A1)


## 2. A data.frame:
set.seed(1)
df <- as.data.frame(rnorm(7, 5, 3))
names(df) <- "length"
row.names(df) <- c("B1","C1","D1","E1","F1","G1","H1")

## 3. Ad the data to A1.phylo$edge.length
A1.phylo$edge.length <- df$length ## wrong!!!

1 Answers1

2

The edges lengths, tip labels and node labels in the "phylo" objects are dealt with in the order they appear in the edge table. Therefore, you should always attribute the different elements while making sure they are in the right order before they get attributed. For example (sorry I couldn't reproduce your example):

set.seed(1)
## A random tree with 6 edges
test_tree <- rtree(4)

## The edge table
test_tree$edge
#     [,1] [,2]
#[1,]    5    1
#[2,]    5    6
#[3,]    6    2
#[4,]    6    7
#[5,]    7    3
#[6,]    7    4

Here the edges are all the elements connecting a node (digits >4) to a tip (digits <5). You can visualise them (and their numbering) using plot:

## Visualising all the elements
plot(test_tree, show.tip.label = FALSE)
edgelabels()
nodelabels()
tiplabels()

So now if you have a dataframe like this:

## A random data frame
df <- as.data.frame(rnorm(6))
names(df) <- "length"
## The edges in the "wrong" order
row.names(df) <- sample(1:6)

You can attribute the rows correctly by using:

## Get the order of the edges
test_tree$edge.length <- df$length[sort(rownames(df))]

In this case the sorting is pretty easy since the edge names in df are numeric but the logic is, the first element in test_tree$edge.length should be the length of the edge connecting node 5 to tip 1, etc...

Again, as your example is not reproducible, it's hard to figure out what's wrong but I would say your df$length is not the correct length.

Thomas Guillerme
  • 1,747
  • 4
  • 16
  • 23
  • Thanks Thomas. I added "library(data.tree)" which should make it reproducable. The scenario I'm thinking of is that you don't know the order. And you want to set the edge.length assosiated with tip "H1" to the value of row "H1" in the data.frame and so on. – Erling Lundevaller Nov 27 '18 at 06:03
  • Thanks Erling, your example now works indeed! You can check the answer I gave to [this question](https://stackoverflow.com/questions/51696837/r-phylo-object-how-to-connect-node-label-and-node-number/51739985#51739985) to translate the edge table into more interpretable data (i.e. which edge links to which node). You can then decide in which order you want your `df$length` to be passed to `tree$edge.length` using the same logic as I described in this answer. I hope it makes sense. – Thomas Guillerme Nov 27 '18 at 06:39