1

I am trying to find the k-nearest neighbor on a sphere using R.

As I deal with million of points brute force is not an option.

Does anyone know how I could do? Thank you

user3507085
  • 700
  • 5
  • 17
  • what is your definition of distance on the sphere? If you can find a k-NN implementation that allows an arbitrary distance definition then you can plug in (e.g.) a great-circle-distance calculator. I will say that the combination of (1) a somewhat unusual specification and (2) a need for a fast/efficient algorithm probably means you're going to have to do some programming yourself rather than grabbing an off-the-shelf implementation ... – Ben Bolker Nov 23 '16 at 14:47
  • on second thought, it's not such an unusual specification. (I had "k-means clustering" in my head.) – Ben Bolker Nov 23 '16 at 14:52
  • What's the radius of your sphere vs the neighbors distance ? – Tensibai Nov 23 '16 at 15:02
  • In case this could be of use: (the second part mainly) [an answer by myself here](http://stackoverflow.com/a/39596737/3627607) – Tensibai Nov 23 '16 at 15:04
  • Thank you for the comments, actually the "sphere" I am dealing with is the Earth ;). The neighbors distance can be of the order of the radius of the Earth , so plane approximation is not possible here :/ – user3507085 Nov 23 '16 at 15:15
  • To Ben Bolker: the distance would be in R the outputs of pointDistance(...,lonlat=T) function, i.e., the geographic distance between two sets of points on the WGS ellipsoid – user3507085 Nov 23 '16 at 15:19
  • 1
    also: http://stackoverflow.com/questions/10549402/kdtree-for-longitude-latitude ? – Ben Bolker Nov 23 '16 at 15:30
  • Thank for the link. Indeed convert spherical data to 3D cartesian then apply a 3D knn search could work. I will try it – user3507085 Nov 23 '16 at 15:38

1 Answers1

3

Using the link provided by Ben Bolker I manage to solve my pb.

lonlat2xyz=function (lon, lat, r) 
{
lon = lon * pi/180
lat = lat * pi/180
if (missing(r)) 
    r <- 6378.1
x <- r * cos(lat) * cos(lon)
y <- r * cos(lat) * sin(lon)
z <- r * sin(lat)
return(cbind(x, y, z))
}

lon1=runif(100,-180,180);lon2=runif(100,-180,180);lat1=runif(100,-90,90);lat2=runif(100,-90,90)

xyz1=lonlat2xyz(lon1,lat1)
xyz2=lonlat2xyz(lon2,lat2)

library(nabor)

out=knn(data=xyz1,query = xyz2,k=20)

library(maps)

map()
points(lon1,lat1,pch=16,col="black")
points(lon2[1],lat2[1],pch=16,col="red")
points(lon1[out$nn.idx[1,]],lat1[out$nn.idx[1,]],pch=16,col="blue")

Note the distances given by knn function are NOT the geographical distances !

user3507085
  • 700
  • 5
  • 17