22

Scatter plots can be hard to interpret when many points overlap, as such overlapping obscures the density of data in a particular region. One solution is to use semi-transparent colors for the plotted points, so that opaque region indicates that many observations are present in those coordinates.

Below is an example of my black and white solution in R:

MyGray <- rgb(t(col2rgb("black")), alpha=50, maxColorValue=255)
x1 <- rnorm(n=1E3, sd=2)
x2 <- x1*1.2 + rnorm(n=1E3, sd=2)
dev.new(width=3.5, height=5)
par(mfrow=c(2,1), mar=c(2.5,2.5,0.5,0.5), ps=10, cex=1.15)
plot(x1, x2, ylab="", xlab="", pch=20, col=MyGray)
plot(x1, x2, ylab="", xlab="", pch=20, col="black")

The advantages of using opacity to indicate point density

However, I recently came across this article in PNAS, which took a similar a approach, but used heat-map coloration as opposed to opacity as an indicator of how many points were overlapping. The article is Open Access, so anyone can download the .pdf and look at Figure 1, which contains a relevant example of the graph I want to create. The methods section of this paper indicates that analyses were done in Matlab.

For the sake of convenience, here is a small portion of Figure 1 from the above article:

Figure 1 from Flombaum et al. 2013, PNAS

How would I create a scatter plot in R that used color, not opacity, as an indicator of point density?

For starters, R users can access this Matlab color scheme in the install.packages("fields") library, using the function tim.colors().

Is there an easy way to make a figure similar to Figure 1 of the above article, but in R? Thanks!

rbatt
  • 4,677
  • 4
  • 23
  • 41

3 Answers3

38

One option is to use densCols() to extract kernel densities at each point. Mapping those densities to the desired color ramp, and plotting points in order of increasing local density gets you a plot much like those in the linked article.

## Data in a data.frame
x1 <- rnorm(n=1E3, sd=2)
x2 <- x1*1.2 + rnorm(n=1E3, sd=2)
df <- data.frame(x1,x2)

## Use densCols() output to get density at each point
x <- densCols(x1,x2, colramp=colorRampPalette(c("black", "white")))
df$dens <- col2rgb(x)[1,] + 1L

## Map densities to colors
cols <-  colorRampPalette(c("#000099", "#00FEFF", "#45FE4F", 
                            "#FCFF00", "#FF9400", "#FF3100"))(256)
df$col <- cols[df$dens]

## Plot it, reordering rows so that densest points are plotted on top
plot(x2~x1, data=df[order(df$dens),], pch=20, col=col, cex=2)

enter image description here

Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • This looks to be exactly the answer I was hoping for... thanks! – rbatt Jun 14 '13 at 00:05
  • @JoshOBrien: This is awesome! Two questions: 1) How were you able to add the image here in your answer? 2) How to add a legend here? – Shambho Apr 20 '14 at 17:29
  • @Shambho -- (1) You might need something like 100 reputation minimum, after which you'll get an image icon on your markup composition box. (2) AFAIK there's no prepackaged way. I would use something like `layout(matrix(1:2,ncol=2),width=c(75,25))` to split the plotting device into two plotting areas, placing the plot above in the first and a color bar in the second. For the color bar, I'd likely start with the `color.bar()` function given [here](http://www.colbyimaging.com/wiki/statistics/color-bars), after first removing from it the call to `dev.new()`. – Josh O'Brien Apr 20 '14 at 17:58
  • @Shambho -- To get it just right, you'll likely want to play around a bit with `par(mar=)` et al. (3) Glad you enjoyed this answer too. It's definitely among my personal 5-10 favorites among the 900+ answers I've given here! – Josh O'Brien Apr 20 '14 at 18:02
  • Inspiring! Thanks again! – Shambho Apr 20 '14 at 23:18
  • @JoshO'Brien How did you choose the colors in `cols <- colorRampPalette(c("#000099", "#00FEFF", "#45FE4F","#FCFF00", "#FF9400", "#FF3100"))(256)`? It doesn't seem that any of the colors in colors() match that. – rbatt Jun 30 '14 at 15:56
  • @rbatt -- Apologies for the much delayed reply! IIRC, I matched the colors to the figure from the article using a tool like [ColorPic](http://www.iconico.com/colorpic/). – Josh O'Brien Dec 09 '14 at 13:49
  • @Basilique That's a good question. To chase down an answer, I think one would need to follow the computations that `densCols()` uses to determine the colors it returns, which seem to be carried out by `grDevices:::.smoothScatterCalcDensity()` and, in turn, `KernSmooth::bkde2D()`. – Josh O'Brien May 25 '20 at 16:18
5

You can get a similar effect by doing hexagonal binning, divide the region into hexagons, color each hexagon based on the number of points in the hexagon. The hexbin package has functions to do this and there are also functions in the ggplot2 package.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
3

You can use smoothScatter for this.

colramp = colorRampPalette(c('white', 'blue', 'green', 'yellow', 'red'))
smoothScatter(x1, x2, colramp=colramp)
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
  • Thanks for the response -- This is approximately the right idea, but I'd like to avoid the smoothing of the points. I tried playing with the bandwidth etc, but it doesn't seem like this function will be able to maintain individual points. – rbatt Jun 13 '13 at 18:26
  • Nice find! Didn't know about either this or the related `densCols()` function that I used in my answer just now. – Josh O'Brien Jun 13 '13 at 20:42