0

I'm making a heatmap in R, but being very new to R, I have some questions:

My data is a big matrix 21 columns and 89 rows, containing numbers from 0 to 16. I would like to get the heatmap colored in a heatmappy way from 0 (white) to 16 (dark red - or any color for that sake). Or maybe even fancier, have a color palette going from 0 to <10 (so that the points having above 10 "hits" get the same color).

Can anyone help me with this ? Thanks alot!

My code:

library(ggplot2)
library("RColorBrewer")

AS <- read.csv("L:/Pseudoalteromonas/Heatmap antismash/HM_phyl.csv", sep=";")

row.names(AS) <- AS$Strain

AS <- AS[,2:21]

## The colors you specify.
my_palette <- colorRampPalette(c("white", "yellow","orange", "red"))(n = 299)

AS_matrix <- data.matrix(AS)

AS_heatmap <- heatmap(AS_matrix, Rowv=NA, Colv=NA, col = my_palette, scale="row", margins=c(5,10))

My data looks like this:

tail(HM)
   Sideophore Bacteriocin Aryl.polyene Nrps T1pks T2pks T3pks T1pks.Nrps Lantipeptide Terpene Hserlactone Transatpks

S4048          0           2            0    2     0     0     0          0            1       0           0          1
S3655          1           2            2    0     0     0     0          0            0       0           0          0
S4060          0           2            0    7     0     1     1          2            1       0           0          1
S2607          0           2            0   10     1     1     1          4            1       0           0          1
S4054          0           2            1    3     0     0     0          4            1       0           1          1
S4047          0           2            1    7     0     0     0          4            1       0           1          1
  Butyrolactone Indole Thiopeptide Ladderane Pufa Resorcinol Otherks Other
S4048             0      0           0         0    0          0       0     0
S3655             0      0           0         0    0          1       0     0
S4060             0      1           0         0    0          0       0     2
S2607             0      1           0         0    0          0       0     2
S4054             0      1           0         1    0          0       0     0
S4047             0      1           0         1    0          0       0     2
Sara
  • 11
  • 1
  • 3
  • You need to include the breaks, for example : `color_breaks = c(seq(0,2.5,length=100),seq(2.5,5,length=100),seq(5,7.5,length=100), seq(7.5,10,length=100))`, and then : `AS_heatmap <- heatmap(AS_matrix, Rowv=NA, Colv=NA, col = my_palette, scale="row", breaks=color_breaks, margins=c(5,10))` – S Rivero Nov 06 '17 at 16:20
  • Thanks alot :-), but unfortunately does not help.. I get the error message: Error in image.default(1L:nc, 1L:nr, x, xlim = 0.5 + c(0, nc), ylim = 0.5 + : must have one more break than colour – Sara Nov 06 '17 at 16:28
  • Try : `my_palette <- colorRampPalette(c("white", "yellow","orange", "red"))(n = 399)` – S Rivero Nov 06 '17 at 16:30
  • The error message disappears, but now the heatmap is only colored in different yellow colors. Point is also that I would like the color scale to have 10 different colors from white (0 hits) to <10 (above ten hits). Furthernore, the color legend is not showing either, is there a code for that ? – Sara Nov 06 '17 at 16:34
  • It's colored in the colors you selected (from white-->yellow-->orange-->red). You may want to consider to use any of the RColorBrewer palettes (ie sequential). If you want help creating a new palette you need to be more specific and provide some example data – S Rivero Nov 06 '17 at 16:44
  • Thank you so much for your help. I added some example data, if that will help. – Sara Nov 06 '17 at 16:53

1 Answers1

0

You could keep the data as a data.frame and use ggplot2 (it looks like that is what you are intending since you called ggplot2?)

library(ggplot2)
library(RColorBrewer)
library(tidyverse)

set.seed(12343)
# create matrix with 21 columns and 89 rows
# with numbers between 0 - 16

AS <- runif(n= 1869, min = 0, max = 16) %>%
  matrix(., nrow = 89)

colnames(AS) <- LETTERS[1:21]

AS <- as.data.frame(AS)
AS$train <- 1:89

AS <- gather(AS, A:U, key = "colname", value="value")

ggplot(AS, aes(x = colname, y=train)) + geom_tile(aes(fill = value), 
                                                  colour = "white") +
  scale_fill_distiller(palette = "Reds", limits = c(0,10), na.value = "#de2d26",
                       direction = 1, labels = c(0.0, 2.5, 5.0, 7.5, "> 10.0"))

enter image description here

So using your code, maybe something like this:

library(ggplot2)
library(RColorBrewer)
library(tidyverse)


AS <- read.csv("L:/Pseudoalteromonas/Heatmap antismash/HM_phyl.csv", sep=";")

# assume "train" is the row indicator, so we will use
# gather with -train argument to gather all columns but "train"

AS <- gather(AS, -train, key = "colname", value="value")


ggplot(AS, aes(x = colname, y=train)) + geom_tile(aes(fill = value), 
                                                  colour = "white") +
  scale_fill_distiller(palette = "Reds", limits = c(0,10), na.value = "#de2d26",
                       direction = 1, labels = c(0.0, 2.5, 5.0, 7.5, "> 10.0"))

# na.values (values >10) take maximum red color in Reds colorbrewer
  • Yes, something like that! Perfect! My intention was to use ggplot2, but the only "help" I could find online included one using gplot, and many commands did not work, and I am really not good enough yet to make it work. i just tried toggling around with your code, but I'm completely blank. Can you help me write the code? X axis I want the columnnames to show and y axis the strain numbers...thank you so much!! – Sara Nov 06 '17 at 17:39
  • Are you looking to do clustering or are you interested in just showing the raw value in each of the cells? – James Thomas Durant Nov 06 '17 at 18:03
  • No, no clustering, only showing the raw value in each of the cells! Thanks! – Sara Nov 06 '17 at 18:10
  • Changed post answer with some code at the bottom. It might work. – James Thomas Durant Nov 06 '17 at 18:18
  • Thanks you so much for your kind help, this helped tremendously! Can I ask one last thing ? In the csv file, I had the strain numbers ordered and the column names ordered in the way I wanted (as i can then merge the heatmap with another figure). Now the strain numbers are ordered in numbers from low to high and the column names in alphabetic order. Can I "ignore" this ordering in R somehow? Thank you – Sara Nov 07 '17 at 08:13
  • you probably are getting the strain column in as a "factor" variable. I would say you could try adding as.is = TRUE to the arguments of read.csv and see if that would help. – James Thomas Durant Nov 07 '17 at 12:17