0

I am trying to loop through a list containing numbers only. For each loop, I convert the column from char to numeric, and then I attempt to plot it. A basic example of my code is:

library(ggtree) 
library(treeio)
library(tidyverse)
library(ggnewscale)
library(ggtreeExtra)
library(argparse)
library(RColorBrewer)
library(rlist)
library(stringr)

tree <- read.tree("/...") #PLEASE REPLACE THIS WITH THE LOCATION TO 'tree_newick.nwk'

tipcategories = read.csv("....", # PLEASE REPLACE THIS WITH THE LOCATION TO 'plot.tsv'
                     sep = " ",
                     header = TRUE,
                     stringsAsFactors = FALSE)

dd = as.data.frame(tipcategories)

p <- ggtree(tree) + ylim(-1, NA) + theme_tree2() 

p <- p %<+% dd + geom_tiplab(size=1)   

n <- 60
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, 
rownames(qual_col_pals)))

columns = c("Column1", "Column2")

for (col in columns) {

  p <- p + new_scale_fill()

  dd[[col]] <- as.numeric(as.character(dd[[col]]))

  p <- p + geom_fruit(geom=geom_tile, mapping=aes(fill=dd[[col]]), width=2, offset=0.05) +
    scale_fill_continuous(name=col, low='blue', high='red')

}

p <- p + theme(legend.text = element_text(size = 5), legend.key.size = unit(0.3, 'cm'))

ggsave("....") # PLEASE REPLACE THIS WITH WHERE YOU WANT TO SAVE IT

The tree data is (please put in file and replace filename with dots in read tree):

(((((((A:4,B:4):6,C:5):8,D:6):3,E:21):10,((F:4,G:12):14,H:8):13):13,((I:5,J:2):30,(K:11,L:11):2):17):4,M:56);

The metadata file (please put in file and replace filename with dots in read.csv):

Accession1 Column1 Column2   
A 10 130
B 20 120
C 30 110 
D 40 100
E 50 90
F 60 80 
G 70 70
H 80 60
I 90 50
J 100 40
K 110 30
L 120 20
M 130 10

The above works fine for just one columns, however, when trying to plot 2 columns, the second column always overwrites the first column, and the first column ends up looking exactly the same as the second column. The below image shows the result of running the program normally.

This image shows the result of running the program normally

The first column (column1) is actually supposed to look like this:

Could anyone provide help as to how to fix this?

Sinh Nguyen
  • 4,277
  • 3
  • 18
  • 26
Yasir
  • 33
  • 6
  • 1
    It’s very hard to help without sample data to test with. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – MrFlick Apr 14 '21 at 05:37
  • Hi, I apologise for that as I didn't know what you mean't. I hope it is now reproducible (please notify me otherwise). – Yasir Apr 14 '21 at 06:54
  • You can `dput` the data for easier to copy paste into R script ;) – Sinh Nguyen Apr 14 '21 at 07:08

1 Answers1

1

It really take a lot of time to reproduce your case as you have so many packages that I didn't use :)

Explaination of the issue: ggplot does not render any graph at the time you call the geom and passing data and mapping aes. ggplot just store the name reference to the data variable. Only when render it actually get the value and plot. In your case, you are passing reference dd[[col]] and as col change value through for loop while ggplot always reference to col so it ended render two bar of the same data of the last column value is Column2. You can verify this by changing order of the column and put Column1 at last then you will see two bar of Column1 instead.

Solution: create unique reference for each loop

Initial setup with data in dput format

library(ggtree)
library(treeio)
library(tidyverse)
library(ggnewscale)
library(ggtreeExtra)
library(argparse)
library(RColorBrewer)
library(rlist)
library(stringr)

tree <- structure(list(edge = structure(c(14L, 15L, 16L, 17L, 18L, 19L, 
  20L, 20L, 19L, 18L, 17L, 16L, 21L, 22L, 22L, 21L, 15L, 23L, 24L, 
  24L, 23L, 25L, 25L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 
  3L, 4L, 5L, 21L, 22L, 6L, 7L, 8L, 23L, 24L, 9L, 10L, 25L, 11L, 
  12L, 13L), .Dim = c(24L, 2L)), edge.length = c(4, 13, 10, 3, 
    8, 6, 4, 4, 5, 6, 21, 13, 14, 4, 12, 8, 17, 30, 5, 2, 2, 11, 
    11, 56), Nnode = 12L, tip.label = c("A", "B", "C", "D", "E", 
      "F", "G", "H", "I", "J", "K", "L", "M")), class = "phylo",
  order = "cladewise")

tipcategories <- structure(
  list(Accession1 = c("A", "B", "C", "D", "E", "F", "G", 
    "H", "I", "J", "K", "L", "M"), Column1 = c(10L, 20L, 30L, 40L, 
      50L, 60L, 70L, 80L, 90L, 100L, 110L, 120L, 130L), Column2 = c(130L, 
        120L, 110L, 100L, 90L, 80L, 70L, 60L, 50L, 40L, 30L, 20L, 10L
      ), X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    X.1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
    ), X.2 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
      NA)), class = "data.frame", row.names = c(NA, -13L))

Your code with plot generation and modification to avoid using same variable for the plot which cause the issue you got in OP

dd <- as.data.frame(tipcategories)

p <- ggtree(tree) + ylim(-1, NA) + theme_tree2()

p <- p %<+% dd + geom_tiplab(size = 1)

n <- 60
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == "qual", ]
col_vector <- unlist(mapply(
  brewer.pal, qual_col_pals$maxcolors,
  rownames(qual_col_pals)
))

columns <- c("Column1", "Column2")

for (col in columns) {
  p <- p + new_scale_fill()

  # assign the value of dd[[col]] into a new variable using the name column
  assign(col, as.numeric(as.character(dd[[col]])))
  
  # using bang bang (!!) & sym to reference the variable inside ggplot call
  # this allow the ggplot to reference to different variable when finally render
  # plot at the end
  p <- p + geom_fruit(geom = geom_tile, mapping = aes(fill = !!sym(col)),
    width = 2, offset = 0.05) +
    scale_fill_continuous(name = col, low = "blue", high = "red")
}

p <- p + theme(legend.text = element_text(size = 5),
  legend.key.size = unit(0.3, "cm"))

p

Created on 2021-04-15 by the reprex package (v2.0.0)

Sinh Nguyen
  • 4,277
  • 3
  • 18
  • 26
  • In one `ggplot` you can only have one type per `aes` so if you already have a continous `fill` you cannot have a discrete `fill` on that same plot. If you ask another question with more detail and what you want to achieve, it would be easier to discuss what can be done. – Sinh Nguyen Apr 16 '21 at 00:43
  • In the past I have had success with manually plotting different types of scales (using new_scale_fill()). I am unsure as to why it is not working now. Regardless, thank you so much for your reply, the information you have provided is very useful. – Yasir Apr 16 '21 at 00:51
  • I haven't use much of `ggnewscale` package before encouter your question. I think it may workout. Then you can just try to switch between `scale_fill_continous` and `scale_fill_discrete` or `scale_fill_manual` base on the current `col` – Sinh Nguyen Apr 16 '21 at 00:57
  • Sorry, but what if there is a letter in the column. I tried to convert letters into numbers before assign(col...,), but the error that comes up was "Discrete value supplied to continuous scale". I know this is due to the letters. How do I fix this issue? – Yasir Apr 16 '21 at 03:43
  • Without the data it really difficult to confirm if the propose could work. As the OP is about graphing which already answered, I think it would best for you to ask a new question with narrow down to specific challenge you are having so other can support you better. – Sinh Nguyen Apr 16 '21 at 04:53
  • I have managed to solve the prior issue. Thanks again for your extensive help! – Yasir Apr 18 '21 at 22:32