How to colour the branches of an unrooted tree using a variable in R_II

Question

I would like to color branches for particular labels in an unrooted neighbour joining trees from input haplotype data. These two examples from akang and Will Hamilton are provided. They tried to address this question but the answers didn't match the correct colors for defined labels.

First example. Here we want to color the three groups from Iris dataframe (setosa, versicolor, virginica) in a neighbour joining tree. I included rownames(Iris) to define a name for each sample in order to identify samples in the plot

library(ape)
head(iris)
prefix <- "Sample"
suffix <- seq(1:150)
Sample <- paste(prefix,suffix, sep ="."); Sample
iris$Sample <- Sample
rownames(iris) <- iris$Sample

D <-dist(as.matrix(iris[, 1:4]))
tree <- nj(D)
str(tree)
plot(tree, type = "unr", show.tip.lab = TRUE, cex=0.3, font=1, 
     edge.col=c("red","green","blue")[iris$Species])

Problem: Samples from each group are not colored properly. Some samples from different groups have the same color (i.e same color for samples from versicolor and virginica)

Second example from Will Hamilton. We want that Sample_A, Sample_B and Sample_E have the same orange color based on haplotype matrix.

Sample <- c('Sample_A', 'Sample_B', 'Sample_C', 'Sample_D', 'Sample_E', 'Sample_F')
SNP_A <- c(0, 1, 1, 0, 1, 1)
SNP_B <- c(0, 1, 1, 0, 1, 1)
SNP_C <- c(0, 0, 1, 1, 1, 0)
SNP_D <- c(1, 1, 0, 0, 1, 0)
SNP_E <- c(0, 0, 1, 1, 0, 1)
SNP_F <- c(0, 0, 1, 1, 0, 1)
df = data.frame(Sample, SNP_A, SNP_B, SNP_C, SNP_D, SNP_E, SNP_F, row.names=c(1))
df

pdist = dist(as.matrix(df), method = "euclidean") #Euclidean distance
phylo_nj <- nj(pdist)#neighbour joining method

#Sample samples but adding some information for each sample
Factor_A <- c('a', 'a', 'b', 'c', 'a', 'b')
Factor_B <- c('d', 'e', 'd', 'd', 'e', 'd')
df2 = data.frame(Sample, Factor_A, Factor_B)
df2

labels_a <- df2$Factor_A %in% "a" #logical vector finding "a" in Factor A
## Which edges connect to these labels?
edge_a <- phylo_nj$edge[,2] %in% match(phylo_nj$tip.label, df2$Sample[labels_a])
## Plotting the factors with the labels a coerced as numeric
plot(unroot(phylo_nj), type = "unrooted", edge.color = c("blue", "orange")[edge_a+1])

Problem: This second example doesn't work properly. Sample A, B, and E should be orange (i.e they are "a") but in the plot, E is blue. C is orange but is "b" (should be blue). I want that A,B,E = orange; C,D,F = blue.

Basically, I want to be able to assign colors to each edge from particular labels. Is this possible?

Inspect `?ape:::plot.phylo()`. It's the underlying method when you call `plot` on a `"phylo"` object. — jay.sf, Jun 27 '20 at 19:13
Yes, edge.color function is the one to color edges. It follows, as the second example, `phylo_nj$edge order`. The problem is that phylo_nj$edge has 9 rows for the total edges but there are 6 labels (SNP_A, B, C, D, E, F). Do you know the way to connect these two data sets?. I tried two different methods but didn't work. Thanks — Javier_HV, Jun 30 '20 at 14:24
I am afraid at the moment I am not familiar enough with this method to be able to help you. — jay.sf, Jun 30 '20 at 14:38

How to colour the branches of an unrooted tree using a variable in R_II

0 Answers0