R phylo object: how to connect node label and node number

Question

A phylo object in R can have internal node labels (phylo_obj$node.label), but many R functions use node numbers instead of the node labels. Even the phylo object itself uses node numbers to describe the edges (phylo_obj$edge) and does not seem to have a direct mapping of internal node labels to these node numbers used for phylo_obj$edge. How do I map node labels (eg., "NodeA" or "Artiodactyla") to the node number (eg., 250 or 212)? I can't find any R functions or generally any docs on this.

Can you give an example of a function that you would like to use, but requires the node number? It would help if you provided a small reproducible example, maybe starting with `phy <- rtree(n=10)` — G5W, Aug 05 '18 at 21:17
I believe that Thomas Guillerme has answered my question. There's some functions that require an integer specifying the internal node (eg., `phangorn::Descendants`), but I wasn't sure how the node integer IDs mapped to the the node labels (eg., 1 <--> mammalia; 2 <--> aves, etc). I don't want to use the wrong node integer and get the wrong descendants — sharchaea, Aug 10 '18 at 08:35

Thomas Guillerme · Accepted Answer · 2022-04-19T15:59:25.923

Not exactly sure what is the objective here but if you want to select specific node numbers in the edge table and there equivalent in the node labels vector, you can simply use tree$node.label[node_number - Ntip(tree)].

In more details:

## Simulating a random tree
set.seed(1)
my_tree <- rtree(10)
my_tree$node.label <- paste0("node", seq(1:9))
## Method 1: selecting a node of interest (e.g. MRCA)
mrca_node <- getMRCA(my_tree, tip = c("t1", "t2"))
#[1] 16

mrca_node is now the ID of the node in the edge table (in this case a number higher than 10). To select the equivalent node label you can simply select the number of tips from the mrca_node:

## The node label for the mrca_node
my_tree$node.label[mrca_node-Ntip(my_tree)]
#[1] "node6"

Alternatively, you can select your node labels from the edge table

## Method 2: directly extracting the nodes from the edge tables
# Function selecting the tip or node name corresponding to the edge row
select.tip.or.node <- function(element, tree) {
    ifelse(element < Ntip(tree)+1,
           tree$tip.label[element],
           tree$node.label[element-Ntip(tree)])
}

## Making the edge table
edge_table <- data.frame(
                "parent" = my_tree$edge[,1],
                "par.name" = sapply(my_tree$edge[,1],
                                    select.tip.or.node,
                                    tree = my_tree),
                "child" = my_tree$edge[,2],
                "chi.name" = sapply(my_tree$edge[,2],
                                    select.tip.or.node,
                                    tree = my_tree)
                )
#   parent par.name child chi.name
#1      11    node1    12    node2
#2      12    node2     1      t10
#3      12    node2    13    node3
#4      13    node3     2       t6
#5      13    node3     3       t9
#6      11    node1    14    node4
#7      14    node4    15    node5
#8      15    node5    16    node6
#9      16    node6     4       t1
#10     16    node6    17    node7
#11     17    node7     5       t2
#12     17    node7     6       t7
#13     15    node5     7       t3
#14     14    node4    18    node8
#15     18    node8    19    node9
#16     19    node9     8       t8
#17     19    node9     9       t4
#18     18    node8    10       t5

Looking back at this years later, I still don't see why the `ape::phylo` object isn't more clear about how the `phylo$edge` matrix matches up to `phylo$tip.label`. It seems like only a small change in the code would be needed to add row names to the `phylo$edge` matrix that at least includes the tip labels. — sharchaea, Sep 05 '21 at 14:28

score 0 · Answer 2 · answered Aug 27 '23 at 23:41

The default, the tips are numbered from 1 to n, where n is the number of the tips. For example, the first tip in the phylo$tip.label has the node number 1.

Then the internal nodes are further numbered. The specific node number can be found based on the edge in the phylo$edge.

R phylo object: how to connect node label and node number

2 Answers2

Linked