2

I want to create a function that makes a heatmap where the y axis will have unique breaks, but repeated and ordered labels. I know that this is might not be a great practice. I am also aware that similar questions have been asked before. For example: ggplot in R, reordering the bars. But I want to achieve these repeated and ordered labels through sorting within a function, not by typing them manually. I am aware of solutions for reordering axes based on the values of factor (e.g., Order Bars in ggplot2 bar graph), but I don't think they apply or can't see how to apply these to my case, where the breaks are unique but the labels repeat.

Here is some code to reproduce the problem and some of my attempts:

Libraries and data

library(ggplot2)
library(dplyr)
library(tidyr)
set.seed(4)
id  <- LETTERS[1:10]
lab <- paste(c("AB", "CD"), 1:5, sep = "_") %>% 
  sample(., size = 10, replace = TRUE)
val <- sample.int(n = 6, size = 10, replace = TRUE)
tes <- ifelse(val >= 4, 1, 0)
dat <- data.frame(id, lab, val, tes)

A heatmap with unique breaks on the y axis

dat2 <- dat %>% gather(kind, value, val:tes) 

ggplot(dat2) + 
  geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) 

enter image description here

A heatmap where the y axis is labeled with repeated labels instead of the unique breaks

This works, to the point that labels are used instead of unique ids, but the y axis is not ordered by the labels. Also, I am not sure about setting breaks and labels from the data frame in wide format (dat), rather than the data frame in long format used by ggplot (dat2).

dat2 <- dat %>% gather(kind, value, val:tes) 

ggplot(dat2) + 
  geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1)  +
  scale_y_discrete(breaks=dat$id, labels=dat$lab)

enter image description here

Mapping the vector of with repeated values on the y axis obviously doesn't work

dat2 <- dat %>% gather(kind, value, val:tes) 

ggplot(dat2) + 
  geom_tile(aes(x = kind, y = lab, fill = value), color="white", size=1)

enter image description here

Repeated and ordered labels, try 1

As expected, merely sorting the input data by the non-unique lab variable does not work.

dat2 <- dat %>% gather(kind, value, val:tes) %>%
  arrange(lab)

ggplot(dat2) +
  geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
  scale_y_discrete(breaks=id, label=lab)

enter image description here

Repeated and ordered labels, try 2

Try to create a named breaks vector ordered by the (repeating) labels. This gets me nowhere. Half the labels are missing and they are still not sorted.

dat2 <- dat %>% gather(kind, value, val:tes) 
brks <- setNames(dat$id, dat$lab)[sort(dat$lab)]

ggplot(dat2) +
  geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
  scale_y_discrete(breaks = brks, labels = names(brks))

enter image description here

Repeated and ordered labels, try 3

Starting with the data frame sorted by label, try to create an ordered factor for lab. Then sort the table by this ordered factor. No luck.

dat2 <- dat %>% gather(kind, value, val:tes) %>% arrange(lab)
dat2 <- mutate(dat2, lab_f=factor(lab, levels=sort(unique(lab)), ordered = TRUE))
dat2 <- arrange(dat2, lab_f)
# check
dat2$lab_f

ggplot(dat2) +
  geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
  scale_y_discrete(breaks = dat2$id, labels = dat2$lab_f)

enter image description here

A workaround, which I can use if I have to, but I am trying to avoid

We can create a combination of id and lab which will be unique and use it for the y axis

dat2 <- dat %>% gather(kind, value, val:tes) %>% 
  mutate(id_lab=paste(lab, id, sep="_")) 

ggplot(dat2) + 
  geom_tile(aes(x = kind, y = id_lab, fill = value), color="white", size=1) 

enter image description here

I must be missing something. Any help is much appreciated.

The goal is to have a function that will take an arbitrarily long table and plot a y axis with unique breaks but (possibly) repeated and ordered labels.

heat <- function(dat) {
  dat2 <- dat %>% gather(kind, value, val:tes) 
  # any other manipulation here

  ggplot(dat2) +
    geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) 
  # scale_y_discrete() (if needed)
}

The plot I am looking for is something like this (created in inkscape)

enter image description here

teofil
  • 2,344
  • 1
  • 8
  • 17

1 Answers1

1

Using limits instead of breaks sets the order:

ggplot(dat2) + 
  geom_tile(aes(x = kind, y = id, fill = value), color="white", size=1) +
  geom_text(aes(x = 1, y = id, label = id), col = 'white') +
  scale_y_discrete(limits = dat$id[order(dat$lab)], labels = sort(dat$lab))
Axeman
  • 32,068
  • 8
  • 81
  • 94