122

I'm trying to make a heatmap using ggplot2 using the geom_tiles function here is my code below:

p<-ggplot(data,aes(Treatment,organisms))+geom_tile(aes(fill=S))+
  scale_fill_gradient(low = "black",high = "red") + 
  scale_x_discrete(expand = c(0, 0)) + 
  scale_y_discrete(expand = c(0, 0)) + 
  theme(legend.position = "right", 
    axis.ticks = element_blank(), 
    axis.text.x = element_text(size = base_size, angle = 90, hjust = 0, colour = "black"),
    axis.text.y = element_text(size = base_size, hjust = 1, colour = "black")).

data is my data.csv file
my X axis is types of Treatment
my Y axis is types of organisms

I'm not too familiar with commands and programming and I'm relatively new at this. I just want to be able to specify the order of the labels on the x axis. In this case, I'm trying to specify the order of "Treatment". By default, it orders alphabetically. How do I override this/keep the data in the same order as in my original csv file?

I've tried this command

scale_x_discrete(limits=c("Y","X","Z"))

where x, y and z are my treatment condition order. It however doesn't work very well, and give me missing heat boxes.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Lisa Ta
  • 1,231
  • 2
  • 9
  • 4
  • Voting to reopen since the linked dup is about reordering by frequency, whereas this question is about imposing an arbitrary order, and is a useful dup target for that case. – zephryl Jun 13 '23 at 12:38

2 Answers2

163

It is a little difficult to answer your specific question without a full, reproducible example. However something like this should work:

#Turn your 'treatment' column into a character vector
data$Treatment <- as.character(data$Treatment)
#Then turn it back into a factor with the levels in the correct order
data$Treatment <- factor(data$Treatment, levels=unique(data$Treatment))

In this example, the order of the factor will be the same as in the data.csv file.

If you prefer a different order, you can order them by hand:

data$Treatment <- factor(data$Treatment, levels=c("Y", "X", "Z"))

However this is dangerous if you have a lot of levels: if you get any of them wrong, that will cause problems.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Drew Steen
  • 16,045
  • 12
  • 62
  • 90
  • 39
    Have to wonder why this is even necessary. Why are the axes reordered by ggplot in the first place? Seems dangerous if someone isn't aware this is going to happen. – Dirk Calloway Mar 25 '14 at 10:27
  • 1
    I just ran into this problem myself making a [heatmap with qplot](http://martinsbioblogg.wordpress.com/2013/03/21/using-r-correlation-heatmap-with-ggplot2/) and automatically applied variable names. Should it be reported? – bright-star Mar 25 '14 at 10:29
  • 11
    @DirkCalloway, this behavior makes sense if you think about how factors work in R. A factor is a vector of integers, each of which is associated with a character 'label'. When you create a factor by reading a column of character values in a text file (e.g. `.csv`), R assigns the integer values in alphabetical order rather than in the order they appear in the file. You can argue whether that makes sense, but `ggplot2` then does the logical thing, which is to display the factor levels in order of their integer values. Your complaint is with `read.table`, not `ggplot2`. – Drew Steen Mar 25 '14 at 14:06
95

One can also simply factorise within the aes() call directly. I am not sure why setting the limits doesn't work for you - I assume you get NA's because you might have typos in your level vector.

The below is certainly not much different than user Drew Steen's answer, but with the important difference of not changing the original data frame.

library(ggplot2)
## this vector might be useful for other plots/analyses
level_order <- c('virginica', 'versicolor', 'setosa') 

p <- ggplot(iris)
p + geom_bar(aes(x = factor(Species, level = level_order)))


## or directly in the aes() call without a pre-created vector:
p + geom_bar(aes(x = factor(Species, level = c('virginica', 'versicolor', 'setosa')))) 
## plot identical to the above - not shown

## or use your vector as limits in scale_x_discrete
p + geom_bar(aes(x = Species)) + 
  scale_x_discrete(limits = level_order) 

Created on 2022-11-20 with reprex v2.0.2

tjebo
  • 21,977
  • 7
  • 58
  • 94
  • I prefer using ``` level_order <- c('virginica', 'versicolor', 'setosa') ggplot(iris, aes(x =Species, y = Petal.Width)) + geom_col() + scale_x_discrete(limits = level_order) ``` – Ceres Jul 13 '22 at 00:07
  • 1
    @Ceres this is of course also a great option. Thanks for sharing. as the question is closed you won't be able to add your suggestion as an answer, therefore I have added it to mine. – tjebo Nov 20 '22 at 13:28