I am trying to plot two variables where N=700K. The problem is that there is too much overlap, so that the plot becomes mostly a solid block of black. Is there any way of having a grayscale "cloud" where the darkness of the plot is a function of the number of points in an region? In other words, instead of showing individual points, I want the plot to be a "cloud", with the more the number of points in a region, the darker that region.
-
4It sounds like you're looking for a heatmap: http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/ – Oct 10 '11 at 15:01
8 Answers
One way to deal with this is with alpha blending, which makes each point slightly transparent. So regions appear darker that have more point plotted on them.
This is easy to do in ggplot2
:
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
ggplot(df,aes(x=x,y=y)) + geom_point(alpha = 0.3)
Another convenient way to deal with this is (and probably more appropriate for the number of points you have) is hexagonal binning:
ggplot(df,aes(x=x,y=y)) + stat_binhex()
And there is also regular old rectangular binning (image omitted), which is more like your traditional heatmap:
ggplot(df,aes(x=x,y=y)) + geom_bin2d()

- 169,992
- 32
- 429
- 468
-
1How can i change the colours? I am now getting blue to black scale, whereas i would like to get reg, green blue scale. – user1007742 Aug 12 '14 at 13:44
-
@user1007742 Use `scale_fill_gradient()` and specify your own low and high colors, or use `scale_fill_brewer()` and choose from one of the sequential palettes. – joran Aug 12 '14 at 14:04
-
@joran thanks, that is working now. How about changing the the type/shape of the points? I get either hexagon or square. I just want simple dots. When i use geom_point(), it gives me error. – user1007742 Aug 12 '14 at 14:09
-
1@user1007742 Well, it's called "hexagonal binning" for a reason! ;) It isn't plotting "points" it is dividing the entire region into hexagonal (or rectangular) bins and then simply coloring the bins based upon how many points are in that bin. So the short answer is "you can't". If you want different shapes, you have to use `geom_point()` and plot each individual point. – joran Aug 12 '14 at 14:18
-
-
plus 1 for `stat_binhex()` looking very appealing and being a good way to describe scatterplot densities. – InfiniteFlash Dec 22 '17 at 23:19
-
@skan Google helps :-) scroll down to the bottom of the page https://www.statmethods.net/graphs/scatterplot.html came across this today when googling... – Simone Apr 02 '18 at 14:07
-
@skan When more isn’t better. Another approach is to be question driven and do data reduction accordingly if necessary. With the hexbin and data cluster visualisation I've been able to explore my 3000 observations quite nicely. But then - that is a tiny data set for a computer scientist. In case you haven’t found these yet: https://rpubs.com/stephenmoore56/135857 and https://www.r-graph-gallery.com/ Good luck! – Simone Apr 06 '18 at 17:11
-
See [my recent answer](https://stackoverflow.com/a/58523956/1870254) on how to effortlessly combine the best of both plots. – jan-glx Oct 23 '19 at 13:35
An overview of several good options in ggplot2
:
library(ggplot2)
x <- rnorm(n = 10000)
y <- rnorm(n = 10000, sd=2) + x
df <- data.frame(x, y)
Option A: transparent points
o1 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05)
Option B: add density contours
o2 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.05) +
geom_density_2d()
Option C: add filled density contours
(Note that the points distort the perception of the colors underneath, may be better without points.)
o3 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(level)), geom = 'polygon') +
scale_fill_viridis_c(name = "density") +
geom_point(shape = '.')
Option D: density heatmap
(Same note as C.)
o4 <- ggplot(df, aes(x, y)) +
stat_density_2d(aes(fill = stat(density)), geom = 'raster', contour = FALSE) +
scale_fill_viridis_c() +
coord_cartesian(expand = FALSE) +
geom_point(shape = '.', col = 'white')
Option E: hexbins
(Same note as C.)
o5 <- ggplot(df, aes(x, y)) +
geom_hex() +
scale_fill_viridis_c() +
geom_point(shape = '.', col = 'white')
Option F: rugs
Possibly my favorite option. Not quite as flashy, but visually simple and simple to understand. Very effective in many cases.
o6 <- ggplot(df, aes(x, y)) +
geom_point(alpha = 0.1) +
geom_rug(alpha = 0.01)
Combine in one figure:
cowplot::plot_grid(
o1, o2, o3, o4, o5, o6,
ncol = 2, labels = 'AUTO', align = 'v', axis = 'lr'
)

- 32,068
- 8
- 81
- 94
-
3This is a very nicely laid-out answer that I think deserves a bit more up-votes. – Lalochezia Mar 26 '18 at 13:02
-
Gives me an error Error in scale_fill_viridis_c() : could not find function "scale_fill_viridis_c" – JustGettinStarted Sep 16 '18 at 17:17
-
updated ggplot2, re-installed ggplot2 and reloaded ggplot2. Didnt fix the error. Separately installed 'viridis' package and that let me use 'scale_fill_viridis' function but not 'scale_fill_viridis_c' function which still gives same error – JustGettinStarted Sep 16 '18 at 17:24
-
oh i believe you. No issues there. Just trying to get to the bottom of the error. – JustGettinStarted Sep 16 '18 at 17:38
You can also have a look at the ggsubplot
package. This package implements features which were presented by Hadley Wickham back in 2011 (http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html).
(In the following, I include the "points"-layer for illustration purposes.)
library(ggplot2)
library(ggsubplot)
# Make up some data
set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each=5000),
xvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)),
yvar = c(rep(1:20,250) + rnorm(5000,sd=5),rep(16:35,250) + rnorm(5000,sd=5)))
# Scatterplot with subplots (simple)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(rep("dummy", length(xvar)), ..count..))), bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
However, this features rocks if you have a third variable to control for.
# Scatterplot with subplots (including a third variable)
ggplot(dat, aes(x=xvar, y=yvar)) +
geom_point(shape=1, aes(color = factor(cond))) +
geom_subplot2d(aes(xvar, yvar,
subplot = geom_bar(aes(cond, ..count.., fill = cond))),
bins = c(15,15), ref = NULL, width = rel(0.8), ply.aes = FALSE)
Or another approach would be to use smoothScatter()
:
smoothScatter(dat[2:3])

- 7,863
- 7
- 55
- 88
-
3
-
-
3
-
unfortunately the package ggsubplot is not maintaned anymore and removed from cran repo...do you know of an alternative package which could be used to generate plots like the first two above? – dieHellste May 02 '19 at 09:28
-
If you use an old version of R & ggplot2, you should be able to get it working – majom May 03 '19 at 12:40
Alpha blending is easy to do with base graphics as well.
df <- data.frame(x = rnorm(5000),y=rnorm(5000))
with(df, plot(x, y, col="#00000033"))
The first six numbers after the #
are the color in RGB hex and the last two are the opacity, again in hex, so 33 ~ 3/16th opaque.

- 36,704
- 7
- 77
- 142
-
20Just to add a bit of context, "#000000" is the color black and the "33" added to the end of the color is the degree of opacity---here, 33%. – Charlie Oct 11 '11 at 16:25
-
-
-
12Minor note; the numbers are in hex so 33 is actually 3/16th opaque. – Aaron left Stack Overflow Dec 13 '11 at 14:50
You can also use density contour lines (ggplot2
):
df <- data.frame(x = rnorm(15000),y=rnorm(15000))
ggplot(df,aes(x=x,y=y)) + geom_point() + geom_density2d()
Or combine density contours with alpha blending:
ggplot(df,aes(x=x,y=y)) +
geom_point(colour="blue", alpha=0.2) +
geom_density2d(colour="black")

- 4,183
- 25
- 41
You may find useful the hexbin
package. From the help page of hexbinplot
:
library(hexbin)
mixdata <- data.frame(x = c(rnorm(5000),rnorm(5000,4,1.5)),
y = c(rnorm(5000),rnorm(5000,2,3)),
a = gl(2, 5000))
hexbinplot(y ~ x | a, mixdata)

- 4,491
- 17
- 28
-
+1 hexbin is my preferred solution - it can take a large # of points and then safely create a plot. I'm not sure that the others won't try to produce a plot, but simply shade things differently ex post. – Iterator Oct 15 '11 at 16:59
-
geom_pointdenisty
from the ggpointdensity
package (recently developed by Lukas Kremer and Simon Anders (2019)) allows you visualize density and individual data points at the same time:
library(ggplot2)
# install.packages("ggpointdensity")
library(ggpointdensity)
df <- data.frame(x = rnorm(5000), y = rnorm(5000))
ggplot(df, aes(x=x, y=y)) + geom_pointdensity() + scale_color_viridis_c()

- 7,611
- 2
- 43
- 63
My favorite method for plotting this type of data is the one described in this question - a scatter-density plot. The idea is to do a scatter-plot but to colour the points by their density (roughly speaking, the amount of overlap in that area).
It simultaneously:
- clearly shows the location of outliers, and
- reveals any structure in the dense area of the plot.
Here is the result from the top answer to the linked question:

- 826
- 1
- 8
- 12
-
1This is my favorite way too. See [my answer](https://stackoverflow.com/a/58523956/1870254) for how to achieve this in `R`. – jan-glx Oct 23 '19 at 13:29