3

I'm trying to make a heatmap with pairwise-Fst values (statistic ranging from 0 to 1) in the lower left triangle and the associated p-value in the upper right triangle. I'd like the order along the axes to be: MSI, WB, JI, QC. I can make two heatmaps with geom_tile in ggplot2 and Frankenstein them together (see picture below of what I'm looking for), but this is not a reproducible/good approach.

The dataframe I used to make the square heatmap is as follows (named pairwise_fst_maxmeanDP_df):

structure(list(Order = c(1, 11, 14, 16, 5, 2, 12, 15, 8, 6, 3, 
13, 10, 9, 7, 4), Population = c("MSI", "MSI", "MSI", "MSI", 
"WB", "WB", "WB", "WB", "JI", "JI", "JI", "JI", "QC", "QC", "QC", 
"QC"), Comparison_Pop = c("MSI", "WB", "JI", "QC", "MSI", "WB", 
"JI", "QC", "MSI", "WB", "JI", "QC", "MSI", "WB", "JI", "QC"), 
    COMBO = c(0, 0, 0, 0, 0.008, 0, 0.001, 0, 0.007, 0.001, 0, 
    0, 0.009, 0.003, 0.002, 0), F_ST = c(0, 0.008, 0.007, 0.009, 
    0.008, 0, 0.001, 0.003, 0.007, 0.001, 0, 0.002, 0.009, 0.003, 
    0.002, 0), P_Value = c(0, 0, 0, 0, 0, 0, 0.001, 0, 0, 0.001, 
    0, 0, 0, 0, 0, 0), F_ST_Lab = c("\"\"", "8.0000000000000002E-3", 
    "7.0000000000000001E-3", "8.9999999999999993E-3", "8.0000000000000002E-3", 
    "\"\"", "1E-3", "3.0000000000000001E-3", "7.0000000000000001E-3", 
    "1E-3", "\"\"", "2E-3", "8.9999999999999993E-3", "3.0000000000000001E-3", 
    "2E-3", "\"\""), P_Value_Lab = c("(<0.001)", "(<0.001)", 
    "(<0.001)", "(<0.001)", "(<0.001)", "(<0.001)", "(0.001)", 
    "(<0.001)", "(<0.001)", "(0.001)", "(<0.001)", "(<0.001)", 
    "(<0.001)", "(<0.001)", "(<0.001)", "(<0.001)")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -16L))

The dataframe I used to try and make the 'triangle' heatmap is as follows (named pairwise_fst_maxmeanDP_df_lowerTri):

structure(list(Population = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 3L, 3L, 4L), levels = c("MSI", "WB", "JI", "QC"), class = "factor"), 
    Comparison_Pop = structure(c(1L, 2L, 3L, 4L, 2L, 3L, 4L, 
    3L, 4L, 4L), levels = c("MSI", "WB", "JI", "QC"), class = "factor"), 
    COMBO = c(0, 0, 0, 0, 0, 0.001, 0, 0, 0, 0), F_ST = c(0, 
    0.008, 0.007, 0.009, 0, 0.001, 0.003, 0, 0.002, 0), P_Value = c(0, 
    0, 0, 0, 0, 0.001, 0, 0, 0, 0), F_ST_Lab = c("\"\"", "8.0000000000000002E-3", 
    "7.0000000000000001E-3", "8.9999999999999993E-3", "\"\"", 
    "1E-3", "3.0000000000000001E-3", "\"\"", "2E-3", "\"\""), 
    P_Value_Lab = c("(<0.001)", "(<0.001)", "(<0.001)", "(<0.001)", 
    "(<0.001)", "(0.001)", "(<0.001)", "(<0.001)", "(<0.001)", 
    "(<0.001)"), Label = c("n/n", "0.008/(<0.001)", "0.007/(<0.001)", 
    "0.009/(<0.001)", "n/n", "0.001/(0.001)", "0.003/(<0.001)", 
    "n/n", "0.002/(<0.001)", "n/n"), Pair = c("1,1", "1,2", "1,3", 
    "1,4", "2,2", "2,3", "2,4", "3,3", "3,4", "4,4")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

The code I used to generate the first square heatmap (Fst) is as follows:

fst_maxmeanDP_rmRelated <- ggplot(pairwise_fst_maxmeanDP_df, aes(Population, Comparison_Pop)) + 
  geom_tile(aes(fill = F_ST), colour = "white") + 
  scale_fill_gradient(low = "white", high = "#7570b3", bquote(F[ST])) +
  geom_text(aes(label=pairwise_fst_maxmeanDP_df$F_ST),colour="black") +
  xlab("Population") + ylab("Population") + theme_bw()

I tried extracting the triangles and combining them into a new plot, but the logistics of lining those up was above my head.

I would also be happy having a single lower triangle, colored based on the Fst values, with the Fst value and p-value written on the cell. This is along the lines of the answer here: How to do a triangle heatmap in R using ggplot2, reshape2, and Hmisc?, but I couldn't get this to work with my data (the heatmap cells weren't all in the same triangle). I tried simpler code without the labels, but the graph wasn't lined up:

ggplot(pairwise_fst_maxmeanDP_df_lowerTri, aes(Population, Comparison_Pop)) +
  theme_bw() +
  xlab('Population') +
  ylab('Population') +
  geom_tile(aes(fill = F_ST), color='white') +
  scale_fill_gradient(low = 'white', high = 'darkblue', space = 'Lab') +
  theme(axis.text.x=element_text(angle=90),
        axis.ticks=element_blank(),
        axis.line=element_blank(),
        panel.border=element_blank(),
        panel.grid.major=element_line(color='#eeeeee'))

I've tried a lot of iterations of code and can't seem to find something that will suit my needs, and would sincerely appreciate any help!

Picture of desired output:

Picture of desired output

Quinn
  • 59
  • 6

3 Answers3

3

You can do all this using only your original data frame, with some basic data manipulation and conditional aesthetic mapping:

library(tidyverse)

pairwise_fst_maxmeanDP_df %>%
  mutate(across(contains("op"), ~factor(.x, c("MSI", "WB", "JI", "QC")))) %>%
  mutate(is_upper = as.numeric(Comparison_Pop) > as.numeric(Population)) %>%
  mutate(is_diag = as.numeric(Comparison_Pop) == as.numeric(Population)) %>%
  ggplot(aes(Population, Comparison_Pop)) +
  geom_tile(aes(fill = ifelse(is_upper, 0, F_ST)), colour = "gray") +
  geom_text(aes(label = ifelse(is_upper, P_Value_Lab, F_ST))) +
  geom_tile(aes(alpha = ifelse(is_diag, 1, 0)), fill = "gray") +
  scale_fill_gradient("F_ST", low = "white", high = "#7b77b6") +
  scale_alpha_identity() +
  coord_equal(expand = FALSE)

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Very elegant! +1 – Johannes Titz Aug 17 '23 at 08:21
  • Thanks so much; this is exactly what I need! I'd like to clarify a few things the code is doing. The `mutate across` line is making any column with 'op' in the name a factor and ordering the levels? The `is_diag` line creates a new column, where the Population is the same as the Comparison_Pop and these get filled with gray. However, I don't fully understand the line creating `is_uppper` and how it's used in the graph. Additionally, what is `scale_alpha_identity()` doing? It looks like alpha controls transparency, so is that its function here? Removing that line adds a legend for `is_diag`. – Quinn Aug 17 '23 at 13:14
  • You are right about the mutate line - it just ensures that both p**op**ulation variables get turned into factors with the same levels. The `is_diag` line identifies the diagonals. This is just a mechanism for making the diagonal boxes gray. We do that by mapping it to the `alpha` of a second `geom_tile` layer that is filled with gray. The tiles are completely transparent if they are not on the diagonal, and fully opaque if they are, because the `ifelse` converts the TRUE/FALSE to 1/0 which is interpreted literally by `scale_alpha_identity` – Allan Cameron Aug 17 '23 at 13:32
  • The `is_upper` variable just tests whether the tile is in the upper triangle or not. We use this via another `ifelse` to switch whether the `geom_text` label is the p value or the F_ST value – Allan Cameron Aug 17 '23 at 13:33
2

Recently, I made a similar plot for a publication. I prefer plot.matrix for heatmaps, but I think this can be adopted to ggplot2. My solution looks like this (your first data frame is called d here):

# please install, if you do not have them
library(plot.matrix, Matrix, reshape2)

d <- pairwise_fst_maxmeanDP_df

# create the two matrices
d2 <- d[, c(2, 3, 5)]
d3 <- d[, c(2, 3, 6)]
m1 <- reshape2::acast(d2, Population ~ Comparison_Pop, values_from = "F_ST")
m2 <- reshape2::acast(d3, Population ~ Comparison_Pop, values_from = "P_Value")

# combine into a single matrix
a <- tril(m1) # lower triangular matrix
b <- triu(m2) # upper triangular matrix
c <- a + b
diag(c) <- NA

# plot
plot(as.matrix(c), 
     fmt.cell='%.3f',  # cell content, rounded to 3 digits
     main = "", #title
     key = list(side = 3, cex.axis = 0.75), # position of legend
     xlab = "Population", 
     ylab = "Population", 
     col = colorRampPalette(c("white", "#7b77b6")), # color
     na.col = "gray", na.print = F # handling NA
)

You can adjust these to your needs, but I think this should get you going.

The final result looks like this:

enter image description here

You can of course swap the triangular matrix parts and adjust colors, labels, etc. to your needs.

Johannes Titz
  • 972
  • 6
  • 11
  • 1
    Thank you! The above answer works exactly for my situation, but this will be useful for future graphs and uses code I'm more familiar with. – Quinn Aug 17 '23 at 13:15
0

If approaches outside of ggplot2 are ok, using gplots::heatmap.2

Generating the matrix from lower triangle of df$F_ST and upper triangle of df$P_Value

mat1_F_ST <- matrix(df$F_ST, nrow=4)
mat2_P_Value <- matrix(df$P_Value, nrow=4)
mat3 <- replace(mat1_F_ST, is.numeric(mat1_F_ST), 0)

mat3 <- replace(mat3, lower.tri(mat1_F_ST), mat1[lower.tri(mat1_F_ST)])
mat3 <- replace(mat3, upper.tri(mat2_P_Value), mat2[upper.tri(mat2_P_Value)])

diag(mat3) <- NA

colnames(mat3) <- unique(df$Population)
rownames(mat3) <- unique(df$Population)

Plotting

library(gplots)

heatmap.2(mat3, trace="none", Colv = NA, Rowv = NA, dendrogram="none", 
  col=colorRampPalette(c("white", "purple2"))(50), 
  cellnote=mat3, notecol="gray20", xlab = "F_ST", ylab = "P_Value")

gplots heatmap.2

Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29