17

I have a matrix having around 1000 geospatial points(Longitude, Latitude) and i am trying to find the points that are in 1KM range.

NOTE: "The points are dynamic, Imagine 1000 vehicles are moving, so i have to re-calculate all distances every few seconds"

I did some searches and read about Graph algorithms like (Floyd–Warshall) to solve this, and I ended up with many keywords, and i am kinda lost now. I am considering the performance and since the search radius is short, I will not consider the curvature of the earth.

Basically, It appears that i have to calculate the distance between every point to every other point then sort the distances starting from every point in the matrix and get the points that are in its range. So if I have 1000 co-ordinates, I have to perfom this process (1000^2-1000) times and I do not beleive this is the optimum solution. Thank You.

Alaa Qutaish
  • 239
  • 1
  • 2
  • 12
  • 1
    I think the problem relates to the closest pair of points problem, check this for further reference http://en.wikipedia.org/wiki/Closest_pair_of_points_problem – Rajesh Pantula Jan 19 '12 at 06:53
  • Are you looking for points that are within 1KM of a particular lat/lon, or are you looking for points that are within 1KM of eachother? – Eve Freeman Jan 19 '12 at 06:55
  • Each point should find all points that are positioned in 1KM range away from it. – Alaa Qutaish Jan 19 '12 at 06:58

7 Answers7

6

If you make a modell with a grid of 1km spacing:

  0   1    2    3
 ___|____|____|____
0   |    |    |
   c|   b|a   |   d
 ___|____|____|____
1   |    |    |
    |    |f   |
 ___|e___|____|____
2   |    |g   | 

let's assume your starting point is a. If your grid is of 1km size, points in 1km reach have to be in the same cell or one of the 8 neighbours (Points b, d, e, f).

Every other cell can be ignored (c,g).

While d is nearly of the same distance to a as c, c can be dropped early, because there are 2 barriers to cross, while a and d lie on opposite areas of their border, and are therefore nearly 2 km away from each other.

For early dropping of element, you can exclude, it is enough to check the x- or y-part of the coordinate. Since a belongs to (0,2), if x is 0 or smaller, or > 3, the point is already out of range.

After filtering only few candidates, you may use exhaustive search.

user unknown
  • 35,537
  • 11
  • 75
  • 121
  • 1
    I am thinking about something similar, But we need to get the points directions, for example if i knew (g) is 1 Km away from (f) from the south and (a) is 1Km away from (f) from the North Then all points that are away from (g) to the south will be excluded WHEN scanning the points around (a). We only need to perform the FULL search at the first time to figure out all points relations to one point and then the possiblities will be reduced as the points relations will be more connected to each other. "the algorithm learns from the positions of the points" i dont know still thinking! – Alaa Qutaish Jan 21 '12 at 03:17
2

In your case, you should be looking at the GeoHash which allows you to quickly query the coordinates within a given distance.

FYI, MongoDB uses geohash internally and it's performing excellently.

qiao
  • 17,941
  • 6
  • 57
  • 46
  • Yes, I was writing up something about this but kept getting distracted. You can use the hash to find things that are nearby so that you don't have to directly compare all 1000. – Eve Freeman Jan 19 '12 at 07:17
  • Also, you shouldn't need to sort anything. Just create a new data structure (preferably with O(1) insert time--if you need the results sorted you could use an O(log n) insert tree structure) and keep a list of all of the points < 1km away for each point. Drop anything > 1km. – Eve Freeman Jan 19 '12 at 07:21
  • interesting suggestion, but what is the data model for this if using geohash? – Jasonw Jan 19 '12 at 07:40
  • @Jasonw: A geohash is simply a z-curve or a morton-curve. It's a hierarchical cluster algorithm and the simplest and cheapest. (http://en.wikipedia.org/wiki/Morton_number_(number_theory). Cheapest because it's not optimal. – Micromega Jan 19 '12 at 07:50
  • @David: okay, read on morton order but don't really understand the data model for geohash that morton order can be efficiently query – Jasonw Jan 19 '12 at 08:52
  • @Jasonw: What's a data model? A z curve is pure math. It's a curve traverse every points in the 2d. Instead to have a number f(x,y)=z calculate f(x,y)=z=hash where is hash is a reduction function log(x) * log(4) means hash has only 4 symbols. With this 4 symbols you can build a quad tree that hierarchically sort the 2d. Query is simply a string operations from the left to the right when you get deeper into the tree. Note that a z curve is suboptimal because you have huge overlap. – Micromega Jan 19 '12 at 09:45
  • @Jasonw: It's better to use a peano curve or a hilbert curve which is monster curve or a fractal. You can read my answer here: http://stackoverflow.com/questions/8753562/computing-which-points-latitude-longitude-are-within-a-certain-distance-in-my/8753768#8753768. – Micromega Jan 19 '12 at 09:45
1

Try with an R-Tree. The R-Tree supports the operation to find all the points closest to a given point that are not further away than a given radius. The execution time is optimal and I think it's O(number_of_points_in_the_result).

0

I had the same problem but in a web service development In my case to avoid the calculation time problem i used a simple divide & conquer solution : The idea was start the calculation of the distance between the new point and the others in every new data insertion, so that my application access directly the distance between those tow points that had been already calculated and put in my database

BiLL
  • 349
  • 1
  • 3
  • 14
0

You could compute geocodes of 1km range around each of those 1000 coordinates and check, whether some points are in that range. May be it's not optimum, but you will save yourself some sorting.

koressak
  • 191
  • 2
  • 10
0

If you want to lookup the matrix for each point vs. each point then you already got the right formula (1000^2-1000). There isn't any shortcut for this calculation. However when you know where to start the search and you want look for points within a 1KM radius you can use a grid or spatial algorithm to speed up the lookup. Most likely it's uses a divide and conquer algorithm and the cheapest of it is a geohash or a z curve. You can also try a kd-tree. Maybe this is even simpler. But if your points are in euklidian space then there is this planar method describe here: http://en.wikipedia.org/wiki/Closest_pair_of_points_problem.

Edit: When I say 1000^2-1000 then I mean the size of the grid but it's actually 1000^(1000 − 1) / 2 pairs of points so a lot less math.

Micromega
  • 12,486
  • 7
  • 35
  • 72
0

I have something sort of similar on a web page I worked on, I think. The user clicks a location on the map and enters a radius, and a function returns all the locations within a database within the given radius. Do you mean you are trying to find the points that are within 1km of one of the points in the radius? Or are you trying to find the points that are within 1km of each other? I think you should do something like this.

radius = given radius
x1 = latitude of given point;
y1 = longitude of given point;
x2 = null;
y2 = null;
x = null;
y = null;
dist = null;

for ( i=0; i<locationArray.length; i++ ) {
    x2 = locationArray[i].latitude;
    y2 = locationArray[i].longitude;
    x = x1 - x2;
    y = y1 - y2;
    dist = sqrt(x^2 + y^2);
    if (dist <= radius)
        these are your points
}

If you are trying to calculate all of the points that are within 1km of another point, you could add an outer loop giving the information of x1 and y1, which would then make the inner loop test the distance between the given point and every other point giving every point in your matrix as input. The calculations shouldn't take too long, since it is so basic.

Jay Elrod
  • 738
  • 1
  • 7
  • 20
  • 1
    Yeah, this is the basic brute-force search, If i have 1000 points then i have 1 million calulations! and points are dymanic in my case, so i have to perform one million calculations every few seconds .. and this does not seem to be optimum solution for doing it. right? – Alaa Qutaish Jan 23 '12 at 01:00