I would like to color branches for particular labels in an unrooted neighbour joining trees from input haplotype data. These two examples from akang and Will Hamilton are provided. They tried to address this question but the answers didn't match the correct colors for defined labels.
First example. Here we want to color the three groups from Iris dataframe (setosa, versicolor, virginica) in a neighbour joining tree. I included rownames(Iris) to define a name for each sample in order to identify samples in the plot
library(ape)
head(iris)
prefix <- "Sample"
suffix <- seq(1:150)
Sample <- paste(prefix,suffix, sep ="."); Sample
iris$Sample <- Sample
rownames(iris) <- iris$Sample
D <-dist(as.matrix(iris[, 1:4]))
tree <- nj(D)
str(tree)
plot(tree, type = "unr", show.tip.lab = TRUE, cex=0.3, font=1,
edge.col=c("red","green","blue")[iris$Species])
Problem: Samples from each group are not colored properly. Some samples from different groups have the same color (i.e same color for samples from versicolor and virginica)
Second example from Will Hamilton. We want that Sample_A, Sample_B and Sample_E have the same orange color based on haplotype matrix.
Sample <- c('Sample_A', 'Sample_B', 'Sample_C', 'Sample_D', 'Sample_E', 'Sample_F')
SNP_A <- c(0, 1, 1, 0, 1, 1)
SNP_B <- c(0, 1, 1, 0, 1, 1)
SNP_C <- c(0, 0, 1, 1, 1, 0)
SNP_D <- c(1, 1, 0, 0, 1, 0)
SNP_E <- c(0, 0, 1, 1, 0, 1)
SNP_F <- c(0, 0, 1, 1, 0, 1)
df = data.frame(Sample, SNP_A, SNP_B, SNP_C, SNP_D, SNP_E, SNP_F, row.names=c(1))
df
pdist = dist(as.matrix(df), method = "euclidean") #Euclidean distance
phylo_nj <- nj(pdist)#neighbour joining method
#Sample samples but adding some information for each sample
Factor_A <- c('a', 'a', 'b', 'c', 'a', 'b')
Factor_B <- c('d', 'e', 'd', 'd', 'e', 'd')
df2 = data.frame(Sample, Factor_A, Factor_B)
df2
labels_a <- df2$Factor_A %in% "a" #logical vector finding "a" in Factor A
## Which edges connect to these labels?
edge_a <- phylo_nj$edge[,2] %in% match(phylo_nj$tip.label, df2$Sample[labels_a])
## Plotting the factors with the labels a coerced as numeric
plot(unroot(phylo_nj), type = "unrooted", edge.color = c("blue", "orange")[edge_a+1])
Problem: This second example doesn't work properly. Sample A, B, and E should be orange (i.e they are "a") but in the plot, E is blue. C is orange but is "b" (should be blue). I want that A,B,E = orange; C,D,F = blue.
Basically, I want to be able to assign colors to each edge from particular labels. Is this possible?