2

I want to add sum and row totals to my heatmap and struggling with already suggested ways to realize in other posts like here: ggplot2: Independent Continuous Fill for Summary Row & Column.

The problem with the above post is, that I do not understand the code where the totals (row, column) are created. Although it is marked as "# create the summary row & column" I don't get it.

So it would be great if... 1. ...somebody could help me and show me a (simple) way to get there in reference to my posted code (below) and also 2. ...if the row- and column totals could have separate color scales.

I tried this...

# create sample
scen <- 1:32
ls <- rep(1:7, length(scen))
df <- data.frame(Landscape = ls, Scenario = scen)
df$SoP <- sample(seq(-0.070, 0.070, by = 0.01),replace=T, nrow(df))
df$Landscape_Name <- LETTERS[1:7]

# create heatmap
library(ggplot2)
df.diff <- ggplot(df, aes(x = Landscape_Name, y = Scenario)) +
  geom_tile(aes(fill = SoP)) +
  geom_text(size = 3, aes(label = round(SoP,2))) + #displays cell values
  scale_fill_gradient2(low = "gold", #colors
                       mid = "white",
                       high = "grey",
                       midpoint = 0) +
  theme(panel.grid.major.x=element_blank(), #no gridlines
        panel.grid.minor.x=element_blank(), 
        panel.grid.major.y=element_blank(), 
        panel.grid.minor.y=element_blank(),
        panel.background=element_rect(fill="white"),
        axis.text.x = element_text(angle=0, hjust = 0.5,vjust=0.5, size = 8,face = NULL),
        axis.text.y = element_text(size = 8,face = NULL),
        plot.title = element_text(size=10,face="bold")) +
  ggtitle("Treatment efficiency") + 
  theme(legend.title=element_text(face="bold", size=8)) + 
  scale_x_discrete(name="Landscape", position = "top") +
  scale_y_discrete(name="Scenario") +
  labs(fill="SoP")
print(df.diff)

Thanks a lot for your help!

Boombardeiro
  • 105
  • 8
  • what were the problems you faced with the method suggested in the post you referenced? also it's a bit tricky to help without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), can you add to you question the output of `dput(Delta)`? – DS_UNI Apr 23 '19 at 13:55
  • I simplified the code (and the content of the question itself) and added a reproducible sample for a better understanding. My challenge is to understand how to apply the suggested post (link) for my code. I dont think its complicated, but somehow I dont get it. – Boombardeiro Apr 24 '19 at 08:10

1 Answers1

7

Let's see if I can explain the answer in the post you referenced i.e., ggplot2: Independent Continuous Fill for Summary Row & Column

First a couple of notes:

  • On the y axis you're plotting a vector of numbers, which is considered a continuous scale, that's why the axis labels are disappearing when you run scale_y_discrete, while the plot is working fine, once we decide on adding a new value to the axis (i.e., Total) this will cause problems, which is why I think Scenario should be a character vector.
  • Turning the column Scenario to string using as.character will mess up sorting the values, try for example running sort(as.character(1:20)), this can be avoided by using 2 digit numbers (01, 02, 03, .....), and that's what I did down there
  • In the above mentioned answer the totals were binded with the original df, however I will use them as external data to make it easier to understand (or at least I think it's easier this way)

So here we go:

library(ggplot2)
library(dplyr)

# pad numbers with zeros to get 2 digit numbers, this will be a string
scen <- sprintf('%02d', 1:32)
ls <- rep(1:7, length(scen))
df <- data.frame(Landscape = ls, Scenario = scen)
df$SoP <- sample(seq(-0.070, 0.070, by = 0.01),replace=T, nrow(df))
df$Landscape_Name <- LETTERS[1:7]

# create the main plot, and take a look at it
df.diff <- ggplot(df, aes(x = Landscape_Name, y = Scenario)) +
  geom_tile(aes(fill = SoP)) +
  geom_text(size = 3, aes(label = round(SoP,2))) + #displays cell values
  scale_fill_gradient2(low = "gold", #colors
                       mid = "white",
                       high = "grey",
                       midpoint = 0) 

df.diff

Now we want the data which allows us to add an extra category to Landscape_Name and an extra category to Scenario, such that:

  • The category added to Landscape_Name (the horizontal sums) is the sum of all SoP for each Scenario, and
  • The category added to Scenario (the vertical sums) is the sum of all SoP for each Landscape_Name

Basically we need group_by and sum

h_total <- df %>% 
  group_by(Scenario) %>% 
  summarise(SoP = sum(SoP)) %>% 
  mutate(Landscape_Name = 'Total')


v_total <- df %>% 
  group_by(Landscape_Name) %>% 
  summarise(SoP = sum(SoP)) %>% 
  mutate(Scenario = 'Total')

Now we can add the grouped data to the original plot, with geom_point, because we used the same column names in the new data, the x and y aesthetics will be inherited from the original plot, and to have a different color scheme as the original plot we use color (not fill), which works fine with the chosen shape.

If you want the cell values for the totals as well, you have to add layers for those too

p <- df.diff + 
  geom_point(data = h_total, 
             aes(color = SoP), 
             size = 10, 
             shape = 19) +
  geom_point(data = v_total, 
             aes(color = SoP), 
             size = 10, 
             shape = 19) +
  scale_color_gradient2(low = "red", #colors
                        mid = "white",
                        high = "grey",
                        midpoint = 0) +
  geom_text(data = h_total, size = 3, aes(label = round(SoP,2))) +
  geom_text(data = v_total, size = 3, aes(label = round(SoP,2)))

p

Finally add the theme customisations, title, and axis and legend labels

p  +
  theme(panel.grid.major.x=element_blank(), #no gridlines
        panel.grid.minor.x=element_blank(), 
        panel.grid.major.y=element_blank(), 
        panel.grid.minor.y=element_blank(),
        panel.background=element_rect(fill="white"),
        axis.text.x = element_text(angle=0, hjust = 0.5,vjust=0.5, size = 8,face = NULL),
        axis.text.y = element_text(size = 8,face = NULL),
        plot.title = element_text(size=10,face="bold"),
        legend.title=element_text(face="bold", size=8))  + 
  scale_x_discrete(name="Landscape", position = "top") +
  scale_y_discrete(name="Scenario", 
                  # if you want the total to be at the bottom instead of at the top, 
                  # you can set the limits of y  with the reversed order of the categories 
                  limits = rev(c(unique(as.character(df$Scenario)), 'Total'))) + 
  # you can here change the y/x ratio 
  coord_fixed(ratio = 0.4) +
  labs(fill="SoP", color ="SoP Total") +
  ggtitle("Treatment efficiency")

I finally saved the plot with ggsave(' PATH/TO/plot.jpeg', width =20, height = 40, units = 'cm')

and this was the output

enter image description here

DS_UNI
  • 2,600
  • 2
  • 11
  • 22
  • Great approach! Thanks a lot for your help. – Boombardeiro Apr 26 '19 at 17:27
  • Glad to help! Please accept my answer if it solved your problem, I would really appreciate that, as well as others, since that could help those who might face similar problems. – DS_UNI Apr 29 '19 at 12:24
  • 1
    One follow up question: Is it possible to set up separate colors for the x- and y-axis of the totals? Currently, the color mode represents min and max values of both totals, but I would have to separate this so that only x-axis values and only y-axis values are addressed. Thanks a lot for your help, again! – Boombardeiro Oct 15 '19 at 11:48
  • This is really great (thank you), but it failed for me when you need to re-order the heatmap rows and columns. Or in other words as the rows and columns are ordered alphabetically it all works. But if the landscape name was e.g. ...S, T, U... then Total would appear in the middle of the heat map. I couldn't figure this out myself using factor() for ordering? – James Jan 06 '22 at 13:51
  • 1
    I found a way to do this. Not with factors but with ordering the x & y axis labels using `scale_x_discrete(limits = c(my_var_names, "Total"), position = "top")` etc.. Inspiration came from this post https://stackoverflow.com/questions/65856117/ggplot2-how-to-add-total-to-a-heatmap?rq=1 – James Jan 06 '22 at 14:23