7

I have a set of points with longitude and latitude coordinates. I want to find out the number of points within a specific radius from each point. I have looked at the RANN and FNN packages and the only relevant function I found is the nn2() in the RANN package. though, I do not want to preset the maximum or minimum points that have to be identified (k variable in the nn2 function). in addition, even though I have tried several different values for k(number of points) and radius in nn2 I always get the same results. even when the radius is set to very small or zero. here's an example of the code I've used

points<- nn2(mydata, k=100, radius = 0.02)

any ideas of how to do this in R?

Alex
  • 131
  • 2
  • 10
  • 1
    Could you include the code you tried in the question (verbatim, not a description). Also, the question is not precisely defined. Do you mean within a distance on the earth's surface. If so you can find other questions that explain how to calculate geographic distance between two points. Then subset all that have distance less than x. Othe possible interpretation is within a spherical radius in 3d space, or within a planar radius on a projected map surface. – dww Aug 18 '16 at 02:12
  • example code included. I have managed to calculate distances between points, using this example http://stackoverflow.com/questions/21977720/r-finding-closest-neighboring-point-and-number-of-neighbors-within-a-given-rad. however, even if the question asks something similar, it is not answered there. and yes I am trying to find the number of points within a radius on the earth's surface. I am looking for all the points included in all radiuses of all points in the dataset – Alex Aug 18 '16 at 02:18
  • Welcome to SO. You could still improve your question. Please read [how to provide minimal reproducible examples in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#answer-5963610). Then edit & improve it accordingly. A good post usually provides minimal input data, the desired output data & code tries - all copy-paste-run'able in a new/clean R session. – lukeA Aug 18 '16 at 08:31

1 Answers1

8

To get what you want from nn2, you need to:

  1. Specify the searchtype to be "radius". By not specifying the searchtype, the searchtype is set to standard and the radius input is ignored.
  2. Specify k to be nrow(mydata) because k is the maximum number of nearest neighbors to compute.

The manner in which you are calling nn2 will return the 100 nearest neighbors in mydata for each point in mydata. Consequently, you will get the same result for any radius. For example:

library(RANN)
set.seed(123)

## simulate some data
lon = runif(100, -99.1, -99)
lat = runif(100, 33.9, 34)

## data is a 100 x 2 matrix (can also be data.frame)
mydata <- cbind(lon, lat)

radius <- 0.02   ## your radius
res <- nn2(mydata, k=nrow(mydata), searchtype="radius", radius = radius)
## prints total number of nearest neighbors (for all points) found using "radius"
print(length(which(res$nn.idx>0)))
##[1] 1224
res1 <- nn2(mydata, k=100, radius = radius) 
## prints total number of nearest neighbors (for all points) found using your call
print(length(which(res1$nn.idx>0)))
##[1] 10000

radius <- 0.03   ## increase radius
res <- nn2(mydata, k=nrow(mydata), searchtype="radius", radius = radius)
## prints total number of nearest neighbors (for all points) found using "radius"
print(length(which(res$nn.idx>0)))
##[1] 2366
res1 <- nn2(mydata, k=100, radius = radius)
## prints total number of nearest neighbors (for all points) found using your call
print(length(which(res1$nn.idx>0)))
##[1] 10000

Note that according to its documentation ?nn2, nn2 returns a list with two elements:

  1. nn.idx: a nrow(query) x k matrix where each row contains the row indices of the k nearest neighbors in mydata to the point at that row in the collection of query points in query. In both of our calls, query=mydata. When called with searchtype="radius", if there are m < k neighbors within the given radius, then k - m of those index values will be set to 0. Since the set of query points is the same as mydata, the index to the point itself will be included.

  2. nn.dist: a nrow(query) x k matrix where each element contains the Euclidean distances for the corresponding nearest neighbor in nn.idx. Here, if the corresponding element in nn.idx is 0, then the value in nn.dist is set to 1.340781e+154.

With your call, you get 100 nearest neighbors for each point in mydata, hence length(which(res1$nn.idx>0))==10000 no matter what the radius is in the example.

Finally, you should note that because nn2 returns two nrow(mydata) x nrow(mydata) in your case, it can very easily overwhelm your memory if you have a lot of points.


Updated to specifically produce the result of getting the count of neighbors within a given radius.

To compute the number of neighbors within a radius of each point in the data, call nn2 as such

res <- nn2(mydata, k=nrow(mydata), searchtype="radius", radius = radius)

Then, do this:

count <- rowSums(res$nn.idx > 0) - 1

Notes:

  1. Since query=mydata and k=nrow(mydata) the resulting res$nn.idx will be nrow(mydata) x nrow(mydata) as explained above.
  2. Each row i of res$nn.idx corresponds to row i in query, which is the i-th point in query=mydata. Call this i-th point p[i].
  3. Each row i of res$nn.idx contains the row indices of the neighbors of p[i] AND zeroes to fill that row in the matrix (because not all points in mydata will be within the radius of p[i]).
  4. Therefore, the number of neighbors of p[i] can be found by finding those values in the row that are greater than 0 and then counting them. This is what rowSums(res$nn.idx > 0) does for each row of res$nn.idx.
  5. Lastly, because query=mydata, the point being queried is in the count itself. That is, a point is the nearest neighbor to itself. Therefore subtract 1 from these counts to exclude that.

The resulting count will be a vector of the counts. The i-th element is the number of neighbors within the radius to the i-th point in query=mydata.

Hope this is clear.

aichao
  • 7,375
  • 3
  • 16
  • 18
  • thank you for your answer. Though I do not understand two things. first is the print function. what is the difference of printing results using radius and call? second: according to your notations, what I understand is that as far as I am using mydata as query, overlapping points are included in the result. how can I find the number of points within the same dataset? to simplify this: I want to find the number of points, for every point within the given radius of one dataset – Alex Aug 19 '16 at 17:49
  • @Alex: I've edited my answer to hopefully clarify your second question. For your first question, I use `print` to simply output the result of the computation from a script to the console in RStudio. There is really no need for `print`. Also, I was outputting the difference between the answer and how you were calling `nn2` to illustrate why you reported that the answer seems always to be the same no matter what value the `radius` is set. – aichao Aug 19 '16 at 18:31
  • thank you so much! that's a solid answer! everything is clear now – Alex Aug 19 '16 at 23:32