15

I am creating density plots with kde2d (MASS) on lat and lon data. I would like to know which points from the original data are within a specific contour.

I create 90% and 50% contours using two approaches. I want to know which points are within the 90% contour and which points are within the 50% contour. The points in the 90% contour will contain all of those within the 50% contour. The final step is to find the points within the 90% contour that are not within the 50% contour (I do not necessarily need help with this step).

# bw = data of 2 cols (lat and lon) and 363 rows
# two versions to do this: 
# would ideally like to use the second version (with ggplot2)

# version 1 (without ggplot2) 
library(MASS)
x <- bw$lon
y <- bw$lat
dens <- kde2d(x, y, n=200)

# the contours to plot
prob <- c(0.9, 0.5)
dx <- diff(dens$x[1:2])
dy <- diff(dens$y[1:2])
sz <- sort(dens$z)
c1 <- cumsum(sz) * dx * dy 
levels <- sapply(prob, function(x) { 
    approx(c1, sz, xout = 1 - x)$y
})
plot(x,y)
contour(dens, levels=levels, labels=prob, add=T)

And here is version 2 - using ggplot2. I would ideally like to use this version to find the points within the 90% and 50% contours.

# version 2 (with ggplot2)
getLevel <- function(x,y,prob) { 
    kk <- MASS::kde2d(x,y)
    dx <- diff(kk$x[1:2])
    dy <- diff(kk$y[1:2])
    sz <- sort(kk$z)
    c1 <- cumsum(sz) * dx * dy
    approx(c1, sz, xout = 1 - prob)$y
}

# 90 and 50% contours
L90 <- getLevel(bw$lon, bw$lat, 0.9)
L50 <- getLevel(bw$lon, bw$lat, 0.5)

kk <- MASS::kde2d(bw$lon, bw$lat)
dimnames(kk$z) <- list(kk$x, kk$y)
dc <- melt(kk$z)

p <- ggplot(dc, aes(x=Var1, y=Var2)) + geom_tile(aes(fill=value)) 
+ geom_contour(aes(z=value), breaks=L90, colour="red")
+ geom_contour(aes(z=value), breaks=L50, color="yellow")
+ ggtitle("90 (red) and 50 (yellow) contours of BW")

I create the plots with all of the lat and lon points plotted and 90% and 50% contours. I simply want to know how to extract the exact points that are within the 90% and 50% contours.

I have tried to find the z values (the elevation of the density plots from kde2d) that are associated with each row of lat and lon values but had no luck. I was also thinking I could add an ID column to the data to label each row and then somehow transfer that over after using melt(). Then I could simply subset the data that has values of z that match each contour I want and see which lat and lon they are compared to the original BW data based on the ID column.

Here is a picture of what I am talking about:

enter image description here

I want to know which red points are within the 50% contour (blue) and which are within the 90% contour (red).

Note: much of this code is from other questions. Big shout-out to all those who contributed!

Machavity
  • 30,841
  • 27
  • 92
  • 100
squishy
  • 344
  • 3
  • 12
  • When you say "within the 90% and 50% contours" do you mean you want to know the lat/lon of all points for which the z-value is greater than 90% or 50% of all of the z values? – eipi10 May 28 '15 at 21:23
  • Edited in question - I want to find the red points that are within the 2 contour 'circles'. – squishy May 28 '15 at 21:26

2 Answers2

13

You can use point.in.polygon from sp

## Interactively check points
plot(bw)
identify(bw$lon, bw$lat, labels=paste("(", round(bw$lon,2), ",", round(bw$lat,2), ")"))

## Points within polygons
library(sp)
dens <- kde2d(x, y, n=200, lims=c(c(-73, -70), c(-13, -12)))  # don't clip the contour
ls <- contourLines(dens, level=levels)
inner <- point.in.polygon(bw$lon, bw$lat, ls[[2]]$x, ls[[2]]$y)
out <- point.in.polygon(bw$lon, bw$lat, ls[[1]]$x, ls[[1]]$y)

## Plot
bw$region <- factor(inner + out)
plot(lat ~ lon, col=region, data=bw, pch=15)
contour(dens, levels=levels, labels=prob, add=T)

enter image description here

Rorschach
  • 31,301
  • 5
  • 78
  • 129
  • Awesome! Simple and to the point. The answer is so obvious now with point.in.polygon. Super informative. – squishy May 29 '15 at 01:17
  • @jenesaisquoi,if I want to use the code to find whether a new pair of points falls within a 95% contour, what would I need to do? – user1560215 Mar 10 '17 at 14:10
5

I think this is the best way I can think of. This uses a trick to convert the contour lines to SpatialLinesDataFrame objects using the ContourLines2SLDF() function from the maptools package. Then I use a trick outlined in Bivand, et al.'s Applied Spatial Data Analysis with R for converting the SpatialLinesDataFrame object to SpatialPolygons. These can then be used with the over() function to extract points within each contour polygon:

##  Simulate some lat/lon data:
x <- rnorm(363, 45, 10)
y <- rnorm(363, 45, 10)

##  Version 1 (without ggplot2):
library(MASS)
dens <- kde2d(x, y, n=200)

##  The contours to plot:
prob <- c(0.9, 0.5)
dx <- diff(dens$x[1:2])
dy <- diff(dens$y[1:2])
sz <- sort(dens$z)
c1 <- cumsum(sz) * dx * dy 
levels <- sapply(prob, function(x) { 
    approx(c1, sz, xout = 1 - x)$y
})
plot(x,y)
contour(dens, levels=levels, labels=prob, add=T)

##  Create spatial objects:
library(sp)
library(maptools)

pts <- SpatialPoints(cbind(x,y))

lines <- ContourLines2SLDF(contourLines(dens, levels=levels))

##  Convert SpatialLinesDataFrame to SpatialPolygons:
lns <- slot(lines, "lines")
polys <- SpatialPolygons( lapply(lns, function(x) {
  Polygons(list(Polygon(slot(slot(x, "Lines")[[1]], 
    "coords"))), ID=slot(x, "ID"))
    }))

##  Construct plot from your points, 
plot(pts)

##  Plot points within contours by using the over() function:
points(pts[!is.na( over(pts, polys[1]) )], col="red", pch=20)
points(pts[!is.na( over(pts, polys[2]) )], col="blue", pch=20)

contour(dens, levels=levels, labels=prob, add=T)

enter image description here

Forrest R. Stevens
  • 3,435
  • 13
  • 21
  • Awesome! Thanks for all of the additional information. I am going to have to accept 6pool's answer because it was a bit more direct. However, your answer gave me a ton of insight into all sorts of new possibilities! :) – squishy May 29 '15 at 01:16
  • Hi, I am trying to replicate the above code. Could someone explain what this is doing? dx <- diff(dens$x[1:2]) dy <- diff(dens$y[1:2]) sz <- sort(dens$z) c1 <- cumsum(sz) * dx * dy levels <- sapply(prob, function(x) { approx(c1, sz, xout = 1 - x)$y }) – user1560215 Jul 06 '15 at 20:53
  • The code is extracting out the points in the contour grid levels that correspond to the supplied values in the `prob` vector. Look at the documentation of the `kde2d()` function and the data structure of `dens` for a clue as to what's going on. Basically you're looking at the differenced vectors in the X/Y directions and the cumulative sum of Z values to find the grid values that correspond to the appropriate percentiles. – Forrest R. Stevens Jul 06 '15 at 21:05
  • So if I wanted to get points which are in the 90% contour but not in 50% contour should out-inner give the results? – user1560215 Jul 08 '15 at 04:02
  • I'm confused a bit... The `over()` function gives you everything you'd need? To calculate those points within a certain band (say between the 0.5 and 0.9 contours) then you could do something like the following: `pts[!is.na( over(pts, polys[1]) ) & is.na( over(pts, polys[2]) )]` Hopefully I'm understanding your question? – Forrest R. Stevens Jul 08 '15 at 05:58