0

so I have an array of latitude/longitude (it's fake latitude/longitude as you can see, but just to illustrate the point & the original array size is MUCH larger than this):

<?php
$my_nodes = array(
1=> array(273078.139,353257.444),
2=> array(273122.77,352868.571),
3=> array(272963.687,353782.863),
4=> array(273949.566,353370.127),
5=> array(274006.13,352910.551),
6=> array(273877.095,353829.704),
7=> array(271961.898,353388.245),
8=> array(272839.07,354303.863),
9=> array(273869.141,354417.432),
10=> array(273207.173,351797.405),
11=> array(274817.901,353466.462),
12=> array(274862.533,352958.718),
13=> array(272034.812,351852.642),
14=> array(274128.978,354676.828),
15=> array(271950.85,354370.149),
16=> array(275087.902,353883.617),
17=> array(275545.711,352969.325)));

?>

I want to be able to find the closest node (in this case a node is either 1,2,3, 4,5, ...) for a given latitude X and latitude Y. I know the easiest way to do this is to do a for loop and then do a margin error difference (abs(latitude_X - latitude_X_array) + abs(latitude_Y - latitude_Y_array)) but this will be very inefficient as the size of the array grows.

I was thinking of doing a binary search, however the array needs to be sorted first in a binary search, however it's hard to sort latitude/longitude and in the end we're finding the CLOSEST latitude/longitude in the array for a given lat X, long Y. What approach should I take here?

UPDATE:

Mark has a valid point, this data could be stored in a database. However, how do I get such info from the db if I want the closest one?

aherlambang
  • 14,290
  • 50
  • 150
  • 253

4 Answers4

2

Have a read of this article which explains all about finding the closest point using latitude and longitude from records stored in a database, and also gives a lot of help on how to make it efficient.... with PHP code examples.

Mark Baker
  • 209,507
  • 32
  • 346
  • 385
  • This is awesome. One question though, it says "If you have parameters $lat and $lon for the centre of the bounding circle (converted from degrees to radians)". So can $lat = 32.223711 and $long = -110.948954 be a valid value or is it in radians? – aherlambang Mar 10 '11 at 12:53
  • In that first SQL example, the lat and long must have already been converted from degrees to radians, because most trigonometric functions in any language assume radians... the next examples (using PHP to build the query) the lat/long are converted to radians within the example code – Mark Baker Mar 10 '11 at 13:16
  • I guess the only problem for me now would be picking the radius of the bounding circle. If it's too big, then too many points.. too small may result in no result back. Any ideas on this? – aherlambang Mar 10 '11 at 13:29
  • I've typically suggested to go with whatever initial value seems most appropriate for your data (if it's just regional data, go with eg 10km, if it's national, go with eg 50km, if it's global, go with 100km)... then (if that doesn't return any results) reissue the request with a larger bounding box, perhaps doubling the size. It's a simplistic approach, but still faster than searching through the entire dataset. The real key to working out the bounding box size is having some idea of the scattering and clustering of locations within your entire dataset.... but that's a whole other question. – Mark Baker Mar 10 '11 at 13:42
  • Gotcha... I think I have some intuition with my datasets – aherlambang Mar 10 '11 at 15:52
0

I had a similar problem when I wanted to re-sample a large number of lat/long points to create a heightfield grid. The simplest way I found was like this:

  1. divide the lat/long space up into a regular grid
  2. create a bucket for each grid square
  3. go through the list, adding each point to the bucket for grid square it falls in
  4. then find the grid square your X,Y point falls in, and search outwards from there
Rob Agar
  • 12,337
  • 5
  • 48
  • 63
0

I'm assuming you're storing your data in a DB table like this?

id | lat   | long   | data
------------------------------------------------
1 | 123.45 | 234.56 | A description of the item
2 | 111.11 | 222.22 | A description of another item

In this case you can use SQL to narrow your result set down.

if you want to find items close to grid ref 20,40, you can do the following query:

SELECT * 
FROM locations
WHERE lat BETWEEN 19 AND 21
AND long BETWEEN 39 AND 41

This will return all the tiems in a 2x2 grid near your specified grid ref.

Several databases also provide spacial datatypes (MySQL and Postgres both do) and they might be worth investigating for this job. I don't, however, have experience with such things so I'm afraid I couldn't help with those.

GordonM
  • 31,179
  • 15
  • 87
  • 129
  • but then, how do we know the closest one? I just need one lat/long that is the closest.. this will give me a set of points possibly – aherlambang Mar 10 '11 at 13:03
  • MySQL spatial support is rudimentary, but Postgres with PostGIS is good - see http://postgis.refractions.net/ – Rob Agar Mar 10 '11 at 13:04
  • @EquinoX You'd still have to do some PHP processing, but it would at least narrow your result set down to something a bit more manageable.@rob, just one more reason to be a Postgres fan. :) I really don't enjoy working with MySQL – GordonM Mar 10 '11 at 13:08
  • @Gordon: You can do the calculation in SQL, too. Just define a function. @EquinoX: The basic idea is to narrow the range of the search by defining a bounding box, calculate the distance to each of the points inside the bounding box, then pick the lowest number. – Mike Sherrill 'Cat Recall' Mar 10 '11 at 13:22
  • @Catcall - I've previously posted a MySQL UDF that performs the haversine calculation somewhere on SO, for use within a SELECT statement ( http://stackoverflow.com/questions/3986556/distance-calculations-in-mysql-queries/3986635#3986635 ) ... useful to then include a HAVING clause in the query (alongside the WHERE lat... etc) to actually return the results from a bounding circle rather than just the bounding square – Mark Baker Mar 10 '11 at 13:46
  • @EquinoX - With the Haversine calculation in MySQL used to add a distance column to the resultset; the result can then be sorted on that column, and a LIMIT 1 applied to return only the closest result – Mark Baker Mar 10 '11 at 13:52
0

To sort a multidimensional array in PHP you'd have to iterate over all elements and compare two at a time. For an array of size n that makes O(n) comparisons. Finding the closest node in the sorted array needs O(log n) distance calculations.
If you iterate over all elements, calculate the distance to the target node and remember the closest element, you'd be done with O(n) distance calculations.
Assuming that comparing two nodes means to compare both lat and lon values and thus is O(2), and further assuming that calculating the distance between two nodes is O(3), you end with
O(2n + 3 log n) for binary search and O(3n) for naive search.
So binary search takes n - 3 log n less operations and is round about 33% faster.

Depending on the distribution of your nodes it could be even faster to sort the list into buckets. During filling the buckets you could also throw away all nodes that would go in a bucket that could never hold the closest node. I can explain this in more detail if you want.

rik
  • 8,592
  • 1
  • 26
  • 21
  • Don't you kind of agree with Mark Baker above that a database might be a better way? – aherlambang Mar 10 '11 at 13:26
  • @EquinoX: Sure. However if the data is not yet in a DB it could certainly take longer to insert and query it than searching it with PHP. – rik Mar 10 '11 at 15:38
  • ...what? I would love to hear how you're going to do a binary search over two-dimensional data, or how you're going to sort it in the first place. – Nick Johnson Mar 10 '11 at 17:40
  • I'd sort with `uasort($my_nodes, 'compare_nodes)` and search with `Array_BinarySearch($target, $nodes, 'compare_nodes, $p)` from http://www.php.net/manual/en/function.array-search.php#93352. If the search does not find `$target`, it returns the index where it would be. I'd then calculate the distance between `$target` and the nodes around this index. – rik Mar 11 '11 at 10:54