5

Trying to find, for each point in a SpatialPointsDataFrame, the distance to the closest point in a second SpatialPointsDataFrame (equivalent to the "nearest" tool in ArcGIS for two SpatialPointDataFrames).

I can do the naive implementation by calculating all pairwise distances using gDistance and taking the min (like answer 1 here), but I have some huge datasets and was looking for something more efficient.

For example, here's a trick with knearneigh for points in same dataset.

Cross-posted on r-sig-geo

Henrik
  • 65,555
  • 14
  • 143
  • 159
nick_eu
  • 3,541
  • 5
  • 24
  • 38
  • It looks like `spDists` in the `sp` package might work for what you want. It's first two arguments appear to be different matrices that could be used to represent two sets of points as in your example. Worth a look anyway. – lmo May 19 '16 at 20:59
  • @Imo thanks! Looks like it's still calculating every pair, so may have same performance issue. Will check against gDistance, but seems like they're doing roughly the same thing. – nick_eu May 19 '16 at 21:02
  • Please don't cross-post. – Josh O'Brien May 19 '16 at 22:32
  • @JoshO'Brien Why not? If I get an answer on one page I always bring it back to the other, but lists serve different populations -- this way knowledge diffuses! – nick_eu May 19 '16 at 22:44
  • 1
    Mostly because it ends up being an added burden on the support community. For instance I wouldn't have spent the time to put together this answer if I'd known Michael Sumner had already given you an answer over on R-sig-geo. (Of course, it's my bad for not reading your question all the way through to see the cross-posting.) That said, thanks at least for leaving the note that you had cross-posted. – Josh O'Brien May 19 '16 at 22:48
  • @JoshO'Brien Ah, ok! Thanks for the heads up -- I'll refrain the the future. But thank you for the answer! – nick_eu May 19 '16 at 22:50
  • You bet. I'm happy to have learned about **nabor**. – Josh O'Brien May 19 '16 at 22:50
  • @JoshO'Brien For what it's worth, your answer seems to include a more generalized library -- I have a set of GIS in R tutorials at http://www.nickeubank.com/gis-in-r/ I'll be sure to add SearchTrees to it! – nick_eu May 19 '16 at 22:51
  • @JoshO'Brien up: http://www.nickeubank.com/wp-content/uploads/2015/10/RGIS7_speedingupgis.html . Thanks! – nick_eu May 19 '16 at 23:05
  • See my answer here : http://stackoverflow.com/questions/21977720/r-finding-closest-neighboring-point-and-number-of-neighbors-within-a-given-rad/43378579#43378579 . The Dist matrix is what you want I think ..... – Nico Coallier Apr 12 '17 at 21:12

2 Answers2

11

The SearchTrees package offers one solution. Quoting from its documentation, it, "provides an implementation of the QuadTree data structure [which it] uses to implement fast k-Nearest Neighbor [...] lookups in two dimensions."

Here's how you could use it to quickly find, for each point in a SpatialPoints object b, the two nearest points in a second SpatialPoints object B

library(sp)
library(SearchTrees)

## Example data
set.seed(1)
A <- SpatialPoints(cbind(x=rnorm(100), y=rnorm(100)))
B <- SpatialPoints(cbind(x=c(-1, 0, 1), y=c(1, 0, -1)))

## Find indices of the two nearest points in A to each of the points in B
tree <- createTree(coordinates(A))
inds <- knnLookup(tree, newdat=coordinates(B), k=2)

## Show that it worked
plot(A, pch=1, cex=1.2)
points(B, col=c("blue", "red", "green"), pch=17, cex=1.5)
## Plot two nearest neigbors
points(A[inds[1,],], pch=16, col=adjustcolor("blue", alpha=0.7))
points(A[inds[2,],], pch=16, col=adjustcolor("red", alpha=0.7))
points(A[inds[3,],], pch=16, col=adjustcolor("green", alpha=0.7))

enter image description here

Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • Hi Josh, I have the same question, but your answer doesn't actually answer the full question. This provides the coordinates to of the closest points in A, but not the *distance* from the points in B to the points in A. Could you possibly add the extra code that would provide a vector (or 2 vectors) of the min distances? A vector that could then be added back as a data column within the shapefile A? – Leah Bevis Jun 11 '20 at 21:29
  • @LeahBevis This should get you the distances to each of the points indexed in `inds`: `t(sapply(seq_len(nrow(inds)), function(i) spDists(B[i, ], A[inds[i, ],])))`. HTH. – Josh O'Brien Jun 12 '20 at 00:15
  • 1
    Thanks so much! This was brilliant. For future users, I didn't understand the inds: bit, and didn't need the t(), but it worked when I applied it like this: `distkm <- sapply(seq_len(nrow(inds)), function(i) spDists(tzprice.shp[i, ], market.shp[inds[i, ],]))` – Leah Bevis Jun 12 '20 at 16:41
  • Also, I THINK the answer is in KM, even though I didn't specify longlat = FALSE w/in spDists. Output was the same whether I specified that option or not. Also, feel free to check out this related post : ) Still wondering about fastest way to get distance to nearest lines in meters/km. https://stackoverflow.com/questions/62335329/fastest-cartesian-distance-r-from-each-point-in-spatialpointsdataframe-to-clos – Leah Bevis Jun 12 '20 at 16:45
0

Another suggestion from R-Sig-Geo is the knn function in the nabor library.

nick_eu
  • 3,541
  • 5
  • 24
  • 38
  • Hi Nick, I found your cross-posting (always nice to actually give the link; I found this https://stat.ethz.ch/pipermail/r-sig-geo/2016-May/024452.html) but it doesn't seem to provide distance in km or m. Seems like it's some sort of Euclidean distance based on the coordinates? Did you actually use this knn function (or the trick from Josh above) to calculate physical distances? If so, could you share? Thanks!! – Leah Bevis Jun 11 '20 at 21:31