17

I am experimenting with ways to deal with overplotting in R, and one thing I want to try is to plot individual points but color them by the density of their neighborhood. In order to do this I would need to compute a 2D kernel density estimate at each point. However, it seems that the standard kernel density estimation functions are all grid-based. Is there a function for computing 2D kernel density estimates at specific points that I specify? I would imagine a function that takes x and y vectors as arguments and returns a vector of density estimates.

Ryan C. Thompson
  • 40,856
  • 28
  • 97
  • 159
  • Is there a specific reason why alpha blending or more standard binning approaches (like hexagonal binning) aren't sufficient? – joran Apr 24 '13 at 20:57
  • 2
    I want outliers to be clearly visible as individual points. Alpha belnding makes the outliers faint, and hexagonal binning turns them into entire hexagons instead of single points. Kernel density estimation on the entire grid does a good job for most of the data, but all the outlier points turn into little gaussian "puffs", so I want to instead compute the kernel density estimate and use it to assign a color to each point. This would produce essentially the same appearance as the grid-based approach wherever lots of points overlap, but would make outliers plainly visible as discrete points. – Ryan C. Thompson Apr 24 '13 at 21:04

2 Answers2

7

If I understand what you want to do, it could be achieved by fitting a smoothing model to the grid density estimate and then using that to predict the density at each point you are interested in. For example:

# Simulate some data and put in data frame DF
n <- 100
x <- rnorm(n)
y <- 3 + 2* x * rexp(n) + rnorm(n)
# add some outliers
y[sample(1:n,20)] <- rnorm(20,20,20)
DF <- data.frame(x,y)

# Calculate 2d density over a grid
library(MASS)
dens <- kde2d(x,y)

# create a new data frame of that 2d density grid
# (needs checking that I haven't stuffed up the order here of z?)
gr <- data.frame(with(dens, expand.grid(x,y)), as.vector(dens$z))
names(gr) <- c("xgr", "ygr", "zgr")

# Fit a model
mod <- loess(zgr~xgr*ygr, data=gr)

# Apply the model to the original data to estimate density at that point
DF$pointdens <- predict(mod, newdata=data.frame(xgr=x, ygr=y))

# Draw plot
library(ggplot2)
ggplot(DF, aes(x=x,y=y, color=pointdens)) + geom_point()

enter image description here

Or, if I just change n 10^6 we get

enter image description here

user2605553
  • 362
  • 1
  • 2
  • 9
Peter Ellis
  • 5,694
  • 30
  • 46
  • Yep, that's exactly what I wanted. Thanks! – Ryan C. Thompson Apr 25 '13 at 08:01
  • Actually, a loess model might result in excessive smoothing of the values. The kernel density is already doing smoothing. Is there a way to just to bilinear (or bicubic) interpolation from the grid values? – Ryan C. Thompson Apr 25 '13 at 19:02
  • If you carefully set the span parameter to loess to a fairly low value you will get behaviour pretty much what you want I think. There might be other ways too. – Peter Ellis Apr 25 '13 at 19:41
5

I eventually found the precise function I was looking for: interp.surface from the fields package. From the help text:

Uses bilinear weights to interpolate values on a rectangular grid to arbitrary locations or to another grid.

Ryan C. Thompson
  • 40,856
  • 28
  • 97
  • 159
  • 2
    I know this is old... but did `fields::interp.surface` work for you? It doesn't work for me with the above toy example because the dimensions don't match between `newdata` and the `interp.surface` output. See http://stackoverflow.com/questions/43896337/use-fieldsinterp-surface-to-interpolate-from-grid-to-irregular-points. – bstock May 10 '17 at 15:07