2

I would like to generated unrooted neighbour joining trees from input haplotype data, and then colour the branches of the trees based on a variable. I am using the packages Ape and ggtree. The haplotypes and co-variables (metadata) are on two separate files with matching sample names. I have been able to produce trees and colour the tips of the trees by variables, but not the tree branches.

Using mock data -

# Packages
library('ggplot2')
library('ape')
library('phangorn')
library('dplyr')
library('ggtree')
library('phylobase')

# Generate haplotype dataframe
Sample <- c('Sample_A', 'Sample_B', 'Sample_C', 'Sample_D', 'Sample_E', 'Sample_F')
SNP_A <- c(0, 1, 1, 0, 1, 1)
SNP_B <- c(0, 1, 1, 0, 1, 1)
SNP_C <- c(0, 0, 1, 1, 1, 0)
SNP_D <- c(1, 1, 0, 0, 1, 0)
SNP_E <- c(0, 0, 1, 1, 0, 1)
SNP_F <- c(0, 0, 1, 1, 0, 1)
df = data.frame(Sample, SNP_A, SNP_B, SNP_C, SNP_D, SNP_E, SNP_F, row.names=c(1))
df

# Metadata
Factor_A <- c('a', 'a', 'b', 'c', 'a', 'b')
Factor_B <- c('d', 'e', 'd', 'd', 'e', 'd')
df2 = data.frame(Sample, Factor_A, Factor_B)
df2

# Generate Euclidian pairwise distance matrix
pdist = dist(as.matrix(df), method = "euclidean")

# Turn pairwise distance matrix into phylo via neighbour joining method
phylo_nj <- nj(pdist)

I can plot the tree in Ape:

# Example tree plot using Ape
plot(unroot(phylo_nj),
     type="unrooted",
     cex=1,
     use.edge.length=TRUE,
     show.tip.label = TRUE,
     lab4ut="axial",
     edge.width=1.5)

And I can plot the tree in ggtree, adding variables to tip points by colour/ shape:

# Plotting in ggtree
mytree <- ggtree(phylo_nj, layout="equal_angle", size=0.5, linetype=1)
mytree

# Adding metadata variables to tree plot
mytree2 <- mytree %<+% df2 + geom_tippoint(aes(shape = Factor_A,
                                               colour = Factor_B),
                                               size = 9,
                                           alpha=0.7)
mytree2

But I can't work out how to make the branches coloured by a variable (rather than tip points), in either Ape or ggtree. I only want terminal branches coloured, not all of the lines of the tree. My aim is to display two (categorical) variables - one by the branch colour and one by the shape (or colour) of the tip. A crude version of what I'm after would look something like the image below (with Factor_A coded by tip shape (neutral colour as shown) and Factor_B coded by the branch colour.

enter image description here

Thanks in advance for the help.

Will Hamilton
  • 357
  • 2
  • 17

1 Answers1

1

You can use the function ape::edges after you plot the tree using ape::plot.phylo for colouring specific edges by giving the start/end node making the edge to colour.

## Colouring the first edge with a red dashed line
plot(unroot(phylo_nj), type = "unrooted")
edges(7, 8, col = "red", lty = 2)

Or you can provide a vector of colours directly in the ape::plot.phylo function:

## Making rainbow edges
plot(unroot(phylo_nj), type = "unrooted", edge.color = rainbow(9))

You can find out which edges to colour from your data frame by using the edge table in the phylo object (phylo_nj$edge). For example:

## Which labels have level "a"
labels_a <- df2$Factor_A %in% "a"

## Which edges connect to these labels?
edge_a <- phylo_nj$edge[,2] %in% match(phylo_nj$tip.label, df2$Sample[labels_a])

## Plotting the factors with the labels a coerced as numeric
plot(unroot(phylo_nj), type = "unrooted", edge.color = c("blue", "orange")[edge_a+1])

You can of course expand that to multiple levels by following this method to detect which edge leads to a tip with any factor level.

Thomas Guillerme
  • 1,747
  • 4
  • 16
  • 23
  • Your first line ("edges(7, 8, col = "red", lty = 2)") gives an error message for me: "Error in (function (classes, fdef, mtable): unable to find an inherited method for function ‘edges’ for signature ‘"numeric"’". The problem with your second line is that I'm not clear how to link the colour to the metadata variables, e.g. colouring edges by Factor_A and tip shape by Factor_B in my example code. – Will Hamilton Feb 19 '20 at 17:17
  • I have updated the answer showing an example on how to detect the levels on edges leading to tips. – Thomas Guillerme Feb 21 '20 at 09:47
  • Hi thanks for your reply. I have edited my example code to include a third category of Factor_A so it can't just be either/or. Main problem is that the branch colours don't match the input data - For example, df2 has Sample_A, Sample_B and Sample_E as being "a" for Factor_A. But the plot produced with your code has the branches of Sample_A, Sample_B and Sample_C coloured in orange, i.e. mismatch. I've tried using different categories of Factor_A and Factor_B but the branches consistently don't match with df2. Once that issue is solved I need to work on having multiple colours and tip shapes! – Will Hamilton Feb 21 '20 at 15:33