8

I'm just starting to learn R but would like project done sooner rather than later. It's rather simple: I have an X column and a Y column consisting of X coordinates and Y coordinates. (Working in the NAD27 coordinate system). Going from the first coordinate, I'd like to find the nearest point within the data set and then move onto the next coordinate and find it's nearest point within the same data set. Ideally, it would go through each point and determine the closest point.

point x         y
1     1601774   14544454
2     1616574   14579422
3     1608698   14572922
4     1602948   14572990
5     1607355   14573871
6     1615336   14578178
7     1603398   14574495
8     1605153   14570727
9     1606758   14573845
10    1606655   14570953
jhenson
  • 103
  • 1
  • 4
  • 3
    Welcome to Stack Overflow! Please include a small sample dataset (I'm unfamiliar with the NAD27 coordinate system and I would assume others might be, too). Further please include any code you've tried to perform this operation. – josliber Sep 16 '15 at 21:25
  • Why negative points? It is a good question. Ask him to edit the question. – Soheil Sep 16 '15 at 21:31
  • 4
    `sp::spDists` or `rgeos::gDistance` ought to help, but you'll need to try some things first (folks are not just going to write code) – hrbrmstr Sep 16 '15 at 21:33
  • how careful/sophisticated do you need to be about the coordinate system? Will a solution based on Euclidean distances be adequate? – Ben Bolker Sep 16 '15 at 21:36
  • Euclidean should work for this. It doesn't have to be too precise. – jhenson Sep 16 '15 at 21:40
  • @Ben, if we assume Euclidean distances, how would you approach? I am curious what is the best way to compare each row with whole data. – Soheil Sep 16 '15 at 21:41
  • 2
    An obvious approach would be to use a distance matrix. This can take a lot of memory though, `ds <- as.matrix(dist(dat, diag=TRUE, upper=TRUE)); diag(ds) <- Inf; apply(ds, 1, which.min)`. But there are probably cleverer ways. – Rorschach Sep 16 '15 at 21:45
  • 2
    it really depends on the programmer time/memory/computation time tradeoff. @bunk is right about the quick&dirty approach. Googling "r find nearest point" gets you a lot of choices. `sqrt(rowSums(sweep(mydata[-i,],MARGIN=1,FUN="-",mydata[i,])^2))` gets you all of the distances from point `i` to all other points ... – Ben Bolker Sep 16 '15 at 21:56
  • See also [this post](http://stackoverflow.com/q/27321856/489704) – jbaums Sep 16 '15 at 22:05
  • 1
    k-d trees could also be an option: http://stackoverflow.com/a/30263451/3093387 – josliber Sep 16 '15 at 22:05
  • If I made a matrix then, I'd have to identify the coordinate that corresponds with the smallest value in column 1. Then loop it for each other coordinate. Correct? – jhenson Sep 16 '15 at 22:24

1 Answers1

10

Here's one way, using the RANN package. The approach is similar to that shown in this post, but is adapted for a single set of points (the linked post was about finding the nearest point in set A to each point in set B).

xy <- read.table(text='point x         y
1     1601774   14544454
2     1616574   14579422
3     1608698   14572922
4     1602948   14572990
5     1607355   14573871
6     1615336   14578178
7     1603398   14574495
8     1605153   14570727
9     1606758   14573845
10    1606655   14570953', header=TRUE, row.names=1)

library(RANN)
closest <- nn2(data=xy, k=2)[[1]]

Above, we supply your single set of points, xy, to the data argument, and specify that we want nn2 to find the two nearest points to each point (because the nearest point is the focal point itself). The nn2 function returns a list with two elements: a vector (matrix, in this case) of indices of each of the k nearest points (for each queried point); and a vector (matrix) of the distances. I'm assuming we're not interested in the distances, so above we subset the result to the first element.

For our problem, the result is a two-column matrix giving the index of the queried point in the first column and the index of the nearest point in the second.

closest

##       [,1] [,2]
##  [1,]    1    8
##  [2,]    2    6
##  [3,]    3    5
##  [4,]    4    7
##  [5,]    5    9
##  [6,]    6    2
##  [7,]    7    4
##  [8,]    8   10
##  [9,]    9    5
## [10,]   10    8

To get a matrix of coordinates of the nearest points, you could use:

xy[closest[, 2], ]

By default nn2 uses a kd tree - you might want to experiment with treetype='bd'.

Community
  • 1
  • 1
jbaums
  • 27,115
  • 5
  • 79
  • 119
  • 1
    This looks awesome. Just to make sure I'm understanding correctly, the closest point to coordinate 1, is coordinate 8? And with coordinate 2, it's coordinate 6, etc? – jhenson Sep 17 '15 at 13:43
  • @jhenson - that's correct (well, the closest point to coord 1 is coord 1, and the second closest is 8, but yes, you get the picture) – jbaums Sep 17 '15 at 13:45
  • 1
    Right! Thanks so much, it's working awesome with my data so far. – jhenson Sep 17 '15 at 13:46