3

I am dealing with space-time data (x,y,z,t) and I would like to find the k nearest neighbors that satisfy some conditions (eg < 10 km and < 10 s)

I could not find any implementation of such "constrained nearest neighbor" algorithm (although some research papers focus on this question).

Does any one know where I could find it (optimally implemented in R) ? Thank you

user3507085
  • 700
  • 5
  • 17
  • Possible duplicate of [Find K nearest neighbors, starting from a distance matrix](http://stackoverflow.com/questions/23449726/find-k-nearest-neighbors-starting-from-a-distance-matrix) – CIsForCookies Apr 19 '17 at 12:00
  • Thanks for your comment. Could you explain with greater details how can I use the question you mention to solve my pb? Are you suggesting to give artificially huge distance to combination of points that do not respect the conditions? However compute the whole distance matrix for every combinations of points of my datasets (with millions of points) would take days ! – user3507085 Apr 19 '17 at 12:37
  • Possibly related: http://stackoverflow.com/questions/3962775/how-to-efficiently-find-k-nearest-neighbours-in-high-dimensional-data, http://stackoverflow.com/questions/4350215/fastest-nearest-neighbor-algorithm, http://stackoverflow.com/questions/20398799/find-k-nearest-points-to-point-p-in-2-dimensional-plane, and https://www.google.com/search?q=find+k+nearest+neighbors – גלעד ברקן Apr 19 '17 at 12:53
  • Thank you but those questions don't deal with the problem of adding a constraint to the knn algorithm – user3507085 Apr 19 '17 at 12:54
  • Good point :) There are some articles referenced here: https://www.google.com/search?q=constrained+nearest+neighbor+queries – גלעד ברקן Apr 19 '17 at 13:04
  • So your data is a set of points (x, y, z, t)? 4-dimensional? If yes you could partition the data into hypercubes with edge length 10, e.g. this would be a cube (0:10, 0:10, 0:10, 0:10) so you will only have to look at the cube the point is located in and the surrounding cubes, which I think is 3^4 = 81 cubes, but this could still be very beneficial because there can be a lot lot more hypercubes with data points in them. – maraca Apr 19 '17 at 13:14
  • Thank you, yes this is (x,y,z,t) data but how to add conditions (eg <10km & 10 s) to what you suggest? – user3507085 Apr 19 '17 at 14:00
  • I think it depends a lot on the distribution of the points, e.g. if they are more or less equally distributed in a range from xmin, ymin, zmin, tmin to xmax, ymax, zmax, tmax then this reduction of the points that have to be considered can be implemented pretty efficiently. – maraca Apr 19 '17 at 14:43
  • Not only the distribution of the points matters, also the differences between the mins and maxes and the precision of your conditions, e.g. is it always a multiple of 1, 2, 5, 10, 0.1,...? – maraca Apr 19 '17 at 16:02
  • I agree. There are some research papers that evaluated this but there is no implementation publicly available ... – user3507085 Apr 20 '17 at 12:39

0 Answers0