kde2d density comparison

Question

I have a question about the kde2d (Kernel density estimator). I am computing two different kde2d for two different sets of data in the same space of variables. When I compare both with a filled.contour2 or contours I see that the set with lower density of points in a scatter plot(Also has less points in the total with a factor of 10) has an higher density for the contours values. I was expecting that the set with higher point density will have higher density contours values, but like I said above is not the case. It has to be with the choice of bandwidth (h)? I am using equals h, and i tried to change but the result did not changed a lot. What is my error?

An example

a <-  runif(1000, 5.0, 7.5)
b <-  runif(1000, 2.0, 3.0)
c <-  runif(100000,5.0, 7.5)
d <-  runif(100000, 2.0, 3.0)
library(MASS)
abdens <- kde2d(a,b,n=100,h=0.5)
cddens <- kde2d(c,d,n=100,h=0.5)
mylevels <- seq(-2.4,30,0.9)
filled.contour2(abdens,xlab="a",ylab="b",xlim=c(5,7.5),ylim=c(2,3), 
                col=hsv(seq(0,1,length=length(mylevels))))
 plot(a,b)
contour(abdens,nlevels=5,add=T,col="blue")
plot(c,d)
contour(cddens,nlevels=5,add=T,col="orange")

Please include a reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Thomas, Jun 27 '13 at 09:20
Where does `filled.contour2` come from? I can't find it any package, only here: http://wiki.cbr.washington.edu/qerm/sites/qerm/images/4/44/Filled.contour2.R — geneorama, Dec 03 '15 at 16:23

IRTFM · Answer 1 · 2013-06-28T22:21:29.303

I'm not sure I agree that the densities should be different in the uniform case. I would have expected a set with a higher number of randomly drawn points from a Normal distribution to have more support for extreme regions and therefore to have lower (estimated) density in the center. That effect might be also be occasionally apparent with bibariate Uniform draws with 1,000 points versus 100,000. I hope my modifications to your code are acceptable. It's easier to see the contours if done after the plots.

(The theoretic density would be the same in both these cases since the density distribution is normalized to an integral of 1.0. We are only looking at estimates with some expected artifacts from "edge" effects. In the case of univariate densities adding the information about bounds can be done with the desity functions in package::logspline.)

kde2d density comparison

1 Answers1