Heatmap with different data in upper and lower triangles

Question

I'm trying to make a heatmap with pairwise-Fst values (statistic ranging from 0 to 1) in the lower left triangle and the associated p-value in the upper right triangle. I'd like the order along the axes to be: MSI, WB, JI, QC. I can make two heatmaps with geom_tile in ggplot2 and Frankenstein them together (see picture below of what I'm looking for), but this is not a reproducible/good approach.

The dataframe I used to make the square heatmap is as follows (named pairwise_fst_maxmeanDP_df):

structure(list(Order = c(1, 11, 14, 16, 5, 2, 12, 15, 8, 6, 3, 
13, 10, 9, 7, 4), Population = c("MSI", "MSI", "MSI", "MSI", 
"WB", "WB", "WB", "WB", "JI", "JI", "JI", "JI", "QC", "QC", "QC", 
"QC"), Comparison_Pop = c("MSI", "WB", "JI", "QC", "MSI", "WB", 
"JI", "QC", "MSI", "WB", "JI", "QC", "MSI", "WB", "JI", "QC"), 
    COMBO = c(0, 0, 0, 0, 0.008, 0, 0.001, 0, 0.007, 0.001, 0, 
    0, 0.009, 0.003, 0.002, 0), F_ST = c(0, 0.008, 0.007, 0.009, 
    0.008, 0, 0.001, 0.003, 0.007, 0.001, 0, 0.002, 0.009, 0.003, 
    0.002, 0), P_Value = c(0, 0, 0, 0, 0, 0, 0.001, 0, 0, 0.001, 
    0, 0, 0, 0, 0, 0), F_ST_Lab = c("\"\"", "8.0000000000000002E-3", 
    "7.0000000000000001E-3", "8.9999999999999993E-3", "8.0000000000000002E-3", 
    "\"\"", "1E-3", "3.0000000000000001E-3", "7.0000000000000001E-3", 
    "1E-3", "\"\"", "2E-3", "8.9999999999999993E-3", "3.0000000000000001E-3", 
    "2E-3", "\"\""), P_Value_Lab = c("(<0.001)", "(<0.001)", 
    "(<0.001)", "(<0.001)", "(<0.001)", "(<0.001)", "(0.001)", 
    "(<0.001)", "(<0.001)", "(0.001)", "(<0.001)", "(<0.001)", 
    "(<0.001)", "(<0.001)", "(<0.001)", "(<0.001)")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -16L))

The dataframe I used to try and make the 'triangle' heatmap is as follows (named pairwise_fst_maxmeanDP_df_lowerTri):

structure(list(Population = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 3L, 3L, 4L), levels = c("MSI", "WB", "JI", "QC"), class = "factor"), 
    Comparison_Pop = structure(c(1L, 2L, 3L, 4L, 2L, 3L, 4L, 
    3L, 4L, 4L), levels = c("MSI", "WB", "JI", "QC"), class = "factor"), 
    COMBO = c(0, 0, 0, 0, 0, 0.001, 0, 0, 0, 0), F_ST = c(0, 
    0.008, 0.007, 0.009, 0, 0.001, 0.003, 0, 0.002, 0), P_Value = c(0, 
    0, 0, 0, 0, 0.001, 0, 0, 0, 0), F_ST_Lab = c("\"\"", "8.0000000000000002E-3", 
    "7.0000000000000001E-3", "8.9999999999999993E-3", "\"\"", 
    "1E-3", "3.0000000000000001E-3", "\"\"", "2E-3", "\"\""), 
    P_Value_Lab = c("(<0.001)", "(<0.001)", "(<0.001)", "(<0.001)", 
    "(<0.001)", "(0.001)", "(<0.001)", "(<0.001)", "(<0.001)", 
    "(<0.001)"), Label = c("n/n", "0.008/(<0.001)", "0.007/(<0.001)", 
    "0.009/(<0.001)", "n/n", "0.001/(0.001)", "0.003/(<0.001)", 
    "n/n", "0.002/(<0.001)", "n/n"), Pair = c("1,1", "1,2", "1,3", 
    "1,4", "2,2", "2,3", "2,4", "3,3", "3,4", "4,4")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

The code I used to generate the first square heatmap (Fst) is as follows:

fst_maxmeanDP_rmRelated <- ggplot(pairwise_fst_maxmeanDP_df, aes(Population, Comparison_Pop)) + 
  geom_tile(aes(fill = F_ST), colour = "white") + 
  scale_fill_gradient(low = "white", high = "#7570b3", bquote(F[ST])) +
  geom_text(aes(label=pairwise_fst_maxmeanDP_df$F_ST),colour="black") +
  xlab("Population") + ylab("Population") + theme_bw()

I tried extracting the triangles and combining them into a new plot, but the logistics of lining those up was above my head.

I would also be happy having a single lower triangle, colored based on the Fst values, with the Fst value and p-value written on the cell. This is along the lines of the answer here: How to do a triangle heatmap in R using ggplot2, reshape2, and Hmisc?, but I couldn't get this to work with my data (the heatmap cells weren't all in the same triangle). I tried simpler code without the labels, but the graph wasn't lined up:

ggplot(pairwise_fst_maxmeanDP_df_lowerTri, aes(Population, Comparison_Pop)) +
  theme_bw() +
  xlab('Population') +
  ylab('Population') +
  geom_tile(aes(fill = F_ST), color='white') +
  scale_fill_gradient(low = 'white', high = 'darkblue', space = 'Lab') +
  theme(axis.text.x=element_text(angle=90),
        axis.ticks=element_blank(),
        axis.line=element_blank(),
        panel.border=element_blank(),
        panel.grid.major=element_line(color='#eeeeee'))

I've tried a lot of iterations of code and can't seem to find something that will suit my needs, and would sincerely appreciate any help!

Picture of desired output:

score 3 · Accepted Answer · answered Aug 16 '23 at 18:41

3

You can do all this using only your original data frame, with some basic data manipulation and conditional aesthetic mapping:

library(tidyverse)

pairwise_fst_maxmeanDP_df %>%
  mutate(across(contains("op"), ~factor(.x, c("MSI", "WB", "JI", "QC")))) %>%
  mutate(is_upper = as.numeric(Comparison_Pop) > as.numeric(Population)) %>%
  mutate(is_diag = as.numeric(Comparison_Pop) == as.numeric(Population)) %>%
  ggplot(aes(Population, Comparison_Pop)) +
  geom_tile(aes(fill = ifelse(is_upper, 0, F_ST)), colour = "gray") +
  geom_text(aes(label = ifelse(is_upper, P_Value_Lab, F_ST))) +
  geom_tile(aes(alpha = ifelse(is_diag, 1, 0)), fill = "gray") +
  scale_fill_gradient("F_ST", low = "white", high = "#7b77b6") +
  scale_alpha_identity() +
  coord_equal(expand = FALSE)

answered Aug 16 '23 at 18:41

Allan Cameron

147,086
7
49
87

Very elegant! +1 – Johannes Titz Aug 17 '23 at 08:21
Thanks so much; this is exactly what I need! I'd like to clarify a few things the code is doing. The `mutate across` line is making any column with 'op' in the name a factor and ordering the levels? The `is_diag` line creates a new column, where the Population is the same as the Comparison_Pop and these get filled with gray. However, I don't fully understand the line creating `is_uppper` and how it's used in the graph. Additionally, what is `scale_alpha_identity()` doing? It looks like alpha controls transparency, so is that its function here? Removing that line adds a legend for `is_diag`. – Quinn Aug 17 '23 at 13:14
You are right about the mutate line - it just ensures that both p**op**ulation variables get turned into factors with the same levels. The `is_diag` line identifies the diagonals. This is just a mechanism for making the diagonal boxes gray. We do that by mapping it to the `alpha` of a second `geom_tile` layer that is filled with gray. The tiles are completely transparent if they are not on the diagonal, and fully opaque if they are, because the `ifelse` converts the TRUE/FALSE to 1/0 which is interpreted literally by `scale_alpha_identity` – Allan Cameron Aug 17 '23 at 13:32
The `is_upper` variable just tests whether the tile is in the upper triangle or not. We use this via another `ifelse` to switch whether the `geom_text` label is the p value or the F_ST value – Allan Cameron Aug 17 '23 at 13:33

Johannes Titz · Answer 2 · 2023-08-17T08:07:08.100

Recently, I made a similar plot for a publication. I prefer plot.matrix for heatmaps, but I think this can be adopted to ggplot2. My solution looks like this (your first data frame is called d here):

# please install, if you do not have them
library(plot.matrix, Matrix, reshape2)

d <- pairwise_fst_maxmeanDP_df

# create the two matrices
d2 <- d[, c(2, 3, 5)]
d3 <- d[, c(2, 3, 6)]
m1 <- reshape2::acast(d2, Population ~ Comparison_Pop, values_from = "F_ST")
m2 <- reshape2::acast(d3, Population ~ Comparison_Pop, values_from = "P_Value")

# combine into a single matrix
a <- tril(m1) # lower triangular matrix
b <- triu(m2) # upper triangular matrix
c <- a + b
diag(c) <- NA

# plot
plot(as.matrix(c), 
     fmt.cell='%.3f',  # cell content, rounded to 3 digits
     main = "", #title
     key = list(side = 3, cex.axis = 0.75), # position of legend
     xlab = "Population", 
     ylab = "Population", 
     col = colorRampPalette(c("white", "#7b77b6")), # color
     na.col = "gray", na.print = F # handling NA
)

You can adjust these to your needs, but I think this should get you going.

The final result looks like this:

You can of course swap the triangular matrix parts and adjust colors, labels, etc. to your needs.

Thank you! The above answer works exactly for my situation, but this will be useful for future graphs and uses code I'm more familiar with. — Quinn, Aug 17 '23 at 13:15

score 0 · Answer 3 · answered Aug 16 '23 at 19:02

If approaches outside of ggplot2 are ok, using gplots::heatmap.2

Generating the matrix from lower triangle of df$F_ST and upper triangle of df$P_Value

mat1_F_ST <- matrix(df$F_ST, nrow=4)
mat2_P_Value <- matrix(df$P_Value, nrow=4)
mat3 <- replace(mat1_F_ST, is.numeric(mat1_F_ST), 0)

mat3 <- replace(mat3, lower.tri(mat1_F_ST), mat1[lower.tri(mat1_F_ST)])
mat3 <- replace(mat3, upper.tri(mat2_P_Value), mat2[upper.tri(mat2_P_Value)])

diag(mat3) <- NA

colnames(mat3) <- unique(df$Population)
rownames(mat3) <- unique(df$Population)

Plotting

library(gplots)

heatmap.2(mat3, trace="none", Colv = NA, Rowv = NA, dendrogram="none", 
  col=colorRampPalette(c("white", "purple2"))(50), 
  cellnote=mat3, notecol="gray20", xlab = "F_ST", ylab = "P_Value")

Heatmap with different data in upper and lower triangles

3 Answers3