0

I currently have a heatmap created from geom_tile which is filled by counts. However, the max count is 300000+ and the minimum is 1, and includes NA as well. I would like there to be a larger variation in colors, how can I achieve that?

enter image description here

Nimantha
  • 6,405
  • 6
  • 28
  • 69
nubprog
  • 31
  • 3
  • We might need more than 2 colours, see https://cran.r-project.org/package=RColorBrewer and use `scale_fill_manual`. – zx8754 Jan 20 '22 at 09:11
  • See also [`scale_fill_gradientn`](https://ggplot2.tidyverse.org/reference/scale_gradient.html) and its `values` argument; e.g. [Is it possible to define the "mid" range in scale_fill_gradient2()?](https://stackoverflow.com/a/21758729); [Increase resolution of color scale for values close to zero](https://stackoverflow.com/a/20584038). Also, it is easier to help if you provide a _minimal_ reproducible example. Cheers – Henrik Jan 20 '22 at 09:39

2 Answers2

0

When you have a skewed distribution you need to transform your data to get enough contrast. Keep in mind though that if you have like 40% of your counts being 1, no matter the transformation, they will remain the same color. However if you have some high numbers as you mention and you do not transform your data, counts of 2, 4, etc. will show up in the same color as your 1's. There are different techniques for left or right skewed distributions. An interesting read can be found here: http://fmwww.bc.edu/repec/bocode/t/transint.html

Common transformations include log10 (or another base), sqrt, 1 / n, etc. or a combination thereof.

Here an example with n's that produce a more or less black spotted graph like yours on default, with a few transformations to create more "contrast"

example

dt <- data.frame(
  x = sample(1:40, 100, replace = T),
  y = sample(1:40, 100, replace = T),
  n = c(rep(c(1:10 %o% 10^(1:7)), 5), sample(1:20, 50, replace = T))
)

ggplot(dt, aes(x, y, fill = n)) + geom_tile()

enter image description here

ggplot(dt, aes(x, y, fill = sqrt(n))) + geom_tile()

enter image description here

ggplot(dt, aes(x, y, fill = log10(n))) + geom_tile()

enter image description here

ggplot(dt, aes(x, y, fill = log(n, 100))) + geom_tile()

enter image description here

ggplot(dt, aes(x, y, fill = sqrt(log(n, 100)))) + geom_tile()

enter image description here

Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22
  • Hi, thanks for the help. I have tried using color=log10(n) but there was not much change in the output. – nubprog Jan 20 '22 at 09:07
  • You can try to define your cuts for the scale yourself, however keep in mind you cannot create variation in a heatmap without variation. In the most extreme case a heatmap always has the same color for equal n's. Your data is not shown, but simply means you have many low counts with a few "outliers" that are very high. – Merijn van Tilborg Jan 20 '22 at 09:12
  • More transformations also here: http://fmwww.bc.edu/repec/bocode/t/transint.html but perhaps you can share some sample data and more important the distribution of your counts. It all depends on how it is skewed to get it best transformed. – Merijn van Tilborg Jan 20 '22 at 10:29
0

I usually use the pheatmap function but sure with yours can also do it. In this example I use the brewer.pal() function to get a palette of colors but you can create your own palette if you know the color codes. The output of this function is just a character vector with the color codes you chose.

require(pheatmap)
require(RColorBrewer)
P <- brewer.pal(9,"Blues")
pheatmap(as.matrix(dataframe), scale = "none", color=brewer.pal(9,"Blues"), 
        treeheight_row = 0,treeheight_col = 0,
        main = "title")

for example this palette I use a lot is with 19 colors instead, which gives larger variety!

palette <- c('#E0FFFF',   '#B0E0E6',  '#FFFACD',  '#FFFF00',
  '#FFD700', '#FFA500', '#FF8C00', '#FF7F50', '#DAA520', 
  '#B8860B', '#B8860B',  '#FA8072','#FF6347',  '#FF4500',
  '#DC143C', '#B22222',  '#A52A2A', '#8B0000', '#800000')
Nimantha
  • 6,405
  • 6
  • 28
  • 69
MonicaOrt
  • 46
  • 4
  • 1
    Hi, thanks for the help. I am usin gggplot. ggplot(df, aes(x=OriginState, y=DestState, color=(n))) + geom_tile() + facet_grid(~ Year) – nubprog Jan 20 '22 at 09:07
  • 1
    That would just replace the blacks for `#E0FFFF` and the handfull of blues to `#800000` which is the same lack of variation, just in different colors. – Merijn van Tilborg Jan 20 '22 at 09:15