1

I'm trying to estimate the area of the 95% contour of a kde object from the ks package in R.

If I use the example data set from the ks package, I would create the kernel object as follow:

library(ks)
data(unicef)
H.scv <- Hscv(x=unicef)
fhat <- kde(x=unicef, H=H.scv)

I can easily plot the 25, 50, 75% contour using the plot function:

plot(fhat)

But I want to estimate the area within the contour.

I saw a similar question here, but the answer proposed does not solve the problem.

In my real application, my dataset is a time series of coordinates of an animal and I want to measure the home range size of this animal using a bivariate normal kernel. I'm using ks package because it allows to estimate the bandwith of a kernel distribution with methods such as plug-in and smoothed cross-validation.

Any help would be really appreciated!

Community
  • 1
  • 1
Marie Auger-Methe
  • 808
  • 1
  • 8
  • 20

2 Answers2

5

Here are two ways to do it. They are both fairly complex conceptually, but actually very simple in code.

fhat <- kde(x=unicef, H=H.scv,compute.cont=TRUE)
contour.95 <- with(fhat,contourLines(x=eval.points[[1]],y=eval.points[[2]],
                                     z=estimate,levels=cont["95%"])[[1]])
library(pracma)
with(contour.95,polyarea(x,y))
# [1] -113.677

library(sp)
library(rgeos)
poly <- with(contour.95,data.frame(x,y))
poly <- rbind(poly,poly[1,])    # polygon needs to be closed...
spPoly <- SpatialPolygons(list(Polygons(list(Polygon(poly)),ID=1)))
gArea(spPoly)
# [1] 113.677

Explanation

First, the kde(...) function returns a kde object, which is a list with 9 elements. You can read about this in the documentation, or you can type str(fhat) at the command line, or, if you're using RStudio (highly recommended), you can see this by expanding the fhat object in the Environment tab.

One of the elements is $eval.points, the points at which the kernel density estimates are evaluated. The default is to evaluate at 151 equally spaced points. $eval.points is itself a list of, in your case 2 vectors. So, fhat$eval.points[[1]] represents the points along "Under-5" and fhat$eval.points[[2]] represents the points along "Ave life exp".

Another element is $estimate, which has the z-values for the kernel density, evaluated at every combination of x and y. So $estimate is a 151 X 151 matrix.

If you call kde(...) with compute.cont=TRUE, you get an additional element in the result: $cont, which contains the z-value in $estimate corresponding to every percentile from 1% to 99%.

So, you need to extract the x- and y-values corresponding to the 95% contour, and use that to calculate the area. You would do that as follows:

fhat <- kde(x=unicef, H=H.scv,compute.cont=TRUE)    
contour.95 <- with(fhat,contourLines(x=eval.points[[1]],y=eval.points[[2]],
                                     z=estimate,levels=cont["95%"])[[1]])

Now, contour.95 has the x- and y-values corresponding to the 95% contour of fhat. There are (at least) two ways to get the area. One uses the pracma package and calculates it directly.

library(pracma)
with(contour.95,polyarea(x,y))
# [1] -113.677

The reason for the negative value has to do with the ordering of x and y: polyarea(...) is interpreting the polygon as a "hole", so it has negative area.

An alternative uses the area calculation routines in rgeos (a GIS package). Unfortunately, this requires you to first turn your coordinates into a "SpatialPolygon" object, which is a bit of a bear. Nevertheless, it is also straightforward.

library(sp)
library(rgeos)
poly <- with(contour.95,data.frame(x,y))
poly <- rbind(poly,poly[1,])    # polygon needs to be closed...
spPoly <- SpatialPolygons(list(Polygons(list(Polygon(poly)),ID=1)))
gArea(spPoly)
# [1] 113.677
jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • 1
    Thank you!! You are right, it would be strange to estimate the area of the unicef kernel. I've edited my question to explain my real goal. Also, note that if you want the 95% isopleth of a home range, it appears that you need the 5th percentile. – Marie Auger-Methe Sep 21 '14 at 14:43
  • 3
    Quick note. If your distribution is multimodal, the answer of @jlhoward will only give you the area of the first mode/peak. This is because `contour.95 <- with(fhat,contourLines(x=eval.points[[1]],y=eval.points[[2]], z=estimate,levels=cont["95%"])[[1]])` only takes the first element of the list returned by contourLines. To get the area of all modes/peak you need a loop to estimates the area of each element of the list. – Marie Auger-Methe Sep 21 '14 at 17:44
  • What if i want to work with more than 2 dimensions? How can I implement extracting the x, y, and z-values corresponding to the 95% contour if i am working with 3D data? And what about if i want to work with more dimensions? – user1007742 Jan 17 '17 at 11:33
  • 2
    It actually seems like to get the 95% contour (within which lies 95% of points), we have to specify `levels=cont["5%"]` instead? The `ks::kde` function seems to reverse the meaning of "probability contour level". – Heisenberg Nov 17 '17 at 20:51
1

Another method would be to use the contourSizes() function within the kde package. I've also been interested in using this package to compare both 2D and 3D space use in ecology, but I wasn't sure how to extract the 2D density estimates. I tested this method by estimating the area of an "animal" which was limited to the area of a circle with a known radius. Below is the code:

set.seed(123)
require(GEOmap)
require(kde)
# need this library for the inpoly function

# Create a data frame centered at  coordinates 0,0
data = data.frame(x=0,y=0)

# Create a vector of radians from 0 to 2*pi for making a circle to
# test the area
circle = seq(0,2*pi,length=100) 

# Select a radius for your circle
radius = 10
# Create a buffer for when you simulate points (this will be more clear below)
buffer = radius+2

# Simulate x and y coordinates from uniform distribution and combine
# values into a dataframe

createPointsX = runif(1000,min = data$x-buffer, max = data$x+buffer)
createPointsY = runif(1000,min = data$y-buffer, max = data$y+buffer)
data1 = data.frame(x=createPointsX,y=createPointsY)

# Plot the raw data
plot(data1$x,data1$y)

# Calculate the coordinates used to create a cirle with center 0,0 and
# with radius specified above
coords = as.data.frame(t(rbind(data$x+sin(circle)*radius,
                           data$y+cos(circle)*radius)))
names(coords) = c("x","y")

# Add circle to plot with red line
lines(coords$x,coords$y,col=2,lwd=2)

# Use the inpoly function to calculate whether points lie within
# the circle or not.
inp = inpoly(data1$x, data1$y, coords)
data1 = data1[inp == 1,]

# Finally add points that lie with the circle as blue filled dots
points(data1$x,data1$y,pch=19,col="blue")

# Radius of the circle (known area)
pi * radius^2
#[1] 314.1593


# Sub in your own data here to calculate 95% homerange or 50% core area usage
H.pi = Hpi(data1,binned=T)
fhat = kde(data1,H=H.pi)
ct1 = contourSizes(fhat, cont = 95, approx=TRUE)

# Compare the known area of the circle to the 95% contour size
ct1
#     5% 
# 291.466 

I've also tried creating 2 un-connected circles and testing the contourSizes() function and it seems to work really well on disjointed distributions.

s_scolary
  • 1,361
  • 10
  • 21