1

I've studied this question and created my own 4-contour map based on several thousands of pairs of longitude and latitude points, but I'm not getting the correct number of points inside each of the 4 contours using the points.in.polygon method mentioned in the above question.

Here is the code so far using MASS library:

 # use kde2d function to create kernel density estimates 
x <- pedestrian.df$longitude
y <- pedestrian.df$latitude
dens <- kde2d(x, y, n=200)

# create the contours to plot - 70%, 50%, 25%, 10% of density contained in each contour 
prob <- c(0.7, 0.5, 0.25, 0.1)
dx <- diff(dens$x[1:4])
dy <- diff(dens$y[1:4])
sz <- sort(dens$z)
c1 <- cumsum(sz) * dx * dy 
levels <- sapply(prob, function(x) { 
    approx(c1, sz, xout = 1 - x)$y
})

#create the contour plot using smoothScatter which smooths the collisions into kernel densities 

smoothScatter(x,y) + contour(dens, levels=levels, labels=prob, col = c("green", "yellow", "orange", "red"), lwd = 1.5, add=T)

This correctly generates the what I expected:

plot

I then tried to use the points.in.polygon function from sp library as in the answer to the above linked question:

ls <- contourLines(dens, level=levels)
zone_1 <- point.in.polygon(df$longitude, df$latitude, ls[[4]]$x, ls[[4]]$y)
zone_2 <- point.in.polygon(df$longitude, df$latitude, ls[[3]]$x, ls[[3]]$y)
zone_3 <- point.in.polygon(df$longitude, df$latitude, ls[[2]]$x, ls[[2]]$y)
zone_4 <- point.in.polygon(df$longitude, df$latitude, ls[[1]]$x, ls[[1]]$y)

But this results in incorrect number of points per zone or contour. I know this is not right because each contour should have progressively more points as the contour gets larger.

I tried looking at ls (a list that stores a list of all the x and y coordinates of the polygons), but there are 15 levels, not the 4 I intuitively would've thought would be there. There are even multiple levels among the 15 that have the same value. I suspect the answer to my issue lies in subsetting this list of lists correctly to include the 4 levels that correspond to my 4 contours, but ls[[1:7]]$x, ls[[1:7]]$y doesn't work.

Thanks for any help and let me know if I could clarify anything!

1 Answers1

2

I think pedestrian is your own data vs something in pkg and since it's not part of the question, we'll use a different one:

library(MASS)
library(sp)

attach(geyser)

data.frame(
  x = geyser$duration,
  y = geyser$waiting
) -> xdf

dens <- kde2d(xdf$x, xdf$y, n = 100)

prob <- c(0.7, 0.5, 0.25, 0.1)

dx <- diff(dens$x[1:4])
dy <- diff(dens$y[1:4])
sz <- sort(dens$z)
c1 <- cumsum(sz) * dx * dy 

levels <- sapply(prob, function(x) { 
  approx(c1, sz, xout = 1 - x)$y
})

smoothScatter(x,y) +
  contour(dens, levels=levels, labels=prob, col = c("green", "yellow", "orange", "red"), lwd = 1.5, add=TRUE)

enter image description here

The reason for 'multiple levels' is that each polygon in a given layer is separate so there are potentially > 1 per-level:

cl <- contourLines(dens, level=levels)

sort(table(sapply(cl, `[[`, "level")))
## 0.00519851181336958 0.00765971436995347  0.0107843979424224  0.0128423136194731 
##                   2                   3                   3                   3

So, just account for that when calculating points-per-polygon:

setNames(
  lapply(cl, function(poly) sum(sp::point.in.polygon(xdf$x, xdf$y, poly$x, poly$y))),
  sapply(cl, `[[`, "level")
) -> level_cts

str(level_cts)
## List of 11
##  $ 0.00519851181336958: int 91
##  $ 0.00519851181336958: int 174
##  $ 0.00765971436995347: int 78
##  $ 0.00765971436995347: int 57
##  $ 0.00765971436995347: int 74
##  $ 0.0107843979424224 : int 65
##  $ 0.0107843979424224 : int 34
##  $ 0.0107843979424224 : int 33
##  $ 0.0128423136194731 : int 42
##  $ 0.0128423136194731 : int 10
##  $ 0.0128423136194731 : int 3

Then we can sum them up:

sapply(
  split(level_cts, names(level_cts)),
  function(level) sum(unlist(level))
) -> pt_cts

pt_cts

## 0.00519851181336958 0.00765971436995347 
##                 265                 209 
##  0.0107843979424224  0.0128423136194731 
##                 132                  55 

And, get %:

pt_cts / nrow(xdf)
## 0.00519851181336958 0.00765971436995347 
##           0.8862876           0.6989967 
##  0.0107843979424224  0.0128423136194731 
##           0.4414716           0.1839465 

UPDATE

Rather than just compute percentages, we can also assign the level to the original data:

do.call(
  rbind.data.frame,
  lapply(cl, function(poly) { # iterate over each polygon
    # figure out which pts are in this polgyon
    which_pts <- as.logical(sp::point.in.polygon(xdf$x, xdf$y, poly$x, poly$y))
    tdf <- xdf[which_pts,] # assign them to a temp data frame
    tdf$level <- poly$level # add the level
    tdf
  })
) -> new_xdf

dplyr::glimpse(new_xdf)
## Observations: 661
## Variables: 3
## $ x     <dbl> 2.000000, 2.033333, 1.833333, 1.616667, 1.766667, 2.0000...
## $ y     <dbl> 77, 77, 81, 89, 73, 83, 84, 85, 79, 75, 91, 87, 86, 78, ...
## $ level <dbl> 0.005198512, 0.005198512, 0.005198512, 0.005198512, 0.00...

# while you likely want the level value, this adds columns for level # & prob
new_xdf$level_num <- as.integer(factor(new_xdf$level, levels, labels=1:length(levels)))
new_xdf$prob <- as.numeric(as.character(factor(new_xdf$level, levels, labels=prob)))

dplyr::glimpse(new_xdf)
## Observations: 661
## Variables: 5
## $ x         <dbl> 2.000000, 2.033333, 1.833333, 1.616667, 1.766667, 2....
## $ y         <dbl> 77, 77, 81, 89, 73, 83, 84, 85, 79, 75, 91, 87, 86, ...
## $ level     <dbl> 0.005198512, 0.005198512, 0.005198512, 0.005198512, ...
## $ level_num <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ prob      <dbl> 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0.7, 0....

dplyr::count(new_xdf, level, level_num, prob)
## # A tibble: 4 x 4
##     level level_num  prob     n
##     <dbl>     <int> <dbl> <int>
## 1 0.00520         1 0.700   265
## 2 0.00766         2 0.500   209
## 3 0.0108          3 0.250   132
## 4 0.0128          4 0.100    55
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • Wow, this works quite well thank you! You missed the step of adding `cl <- contourLines(dens, level=levels)` after plotting, but otherwise this works as-is. If I could ask for a little additional detail: I was planning to identify the points that fall within each contour by creating a new field for `xdf` called 'contour' which is a factor or string that labels each point according to the level of the contour it is in. This would allow for me to look for patterns among the contours, sum rows by factor, etc. How would I add this to the original df? Can't quite figure it out – user10463769 Dec 02 '18 at 11:50
  • thx for catching that. 2am answers rarely work well ;-) i think i added what you asked for but lemme know if it needs tweaking. – hrbrmstr Dec 02 '18 at 12:48
  • I ran your code on geyser like above and it works perfectly there, but for some reason when I adapt it exactly for my own dataset, at the do.call step it creates a new dataframe that has close to 40% more observations than the original dataset. I checked to see if these new rows are duplicates but they're not... I can't tell what they're from since there doesn't appear to be a pattern to these additional rows. Any idea what could cause this? – user10463769 Dec 03 '18 at 00:14
  • Having the data set would help – hrbrmstr Dec 03 '18 at 01:39
  • Thanks again for your patience and godly R skills :) I posted the dataset to [GitHub](https://github.com/jasonukim/pedestrian) - the only fields I'm using for this specific question are longitude and latitude – user10463769 Dec 03 '18 at 03:55