1

Right I have been trying to work out how to compare a given postcode to a database of say store addresses and have them ordered in terms of which one is closest to the given postcode (or ZIP code I guess).

This is mainly out of interest, rather than me asking you for advice and then selling it to a client :-O

First of all after research I discovered that you have to do distance with Lat/Long so I found an API that converts postcodes/zip codes to lat long and now my DB has a structure such as id, store_name, lat, long, postcode and I can convert a given postcode to a lat long.

But how in SQL do I make a query for the ones closest to a given lat long?

Pez Cuckow
  • 14,048
  • 16
  • 80
  • 130

3 Answers3

1

Try something like this:

// get all the zipcodes within the specified radius - default 20
    function zipcodeRadius($lat, $lon, $radius)
    {
        $radius = $radius ? $radius : 20;
        $sql = 'SELECT distinct(ZipCode) FROM zipcode  WHERE (3958*3.1415926*sqrt((Latitude-'.$lat.')*(Latitude-'.$lat.') + cos(Latitude/57.29578)*cos('.$lat.'/57.29578)*(Longitude-'.$lon.')*(Longitude-'.$lon.'))/180) <= '.$radius.';';
        $result = $this->db->query($sql);
        // get each result
        $zipcodeList = array();
        while($row = $this->db->fetch_array($result))
        {
            array_push($zipcodeList, $row['ZipCode']);
        }
        return $zipcodeList;
    }

UPDATE: There is some discussion about efficiency. Here is a little benchmark for you with this query. I have a database that contains EVERY zipcode in the US. Some of them are duplicate because of the way zipcodes work (outside the scope of this topic). So I have just under 80k records. I ran a 20 mile radius distance on 90210:

SELECT distinct(ZipCode) FROM zipcodes  WHERE (3958*3.1415926*sqrt((Latitude-34.09663010)*(Latitude-34.09663010) + cos(Latitude/57.29578)*cos(34.09663010/57.29578)*(Longitude- -118.41242981)*(Longitude- -118.41242981))/180) <= 20

I got back 366 total records and Query took 0.1770 sec. How much more efficient do you need?

Chuck Burgess
  • 11,600
  • 5
  • 41
  • 74
  • It return the zip codes within a specific radius (20 in this example). You can compute the distance between 2 coord by using this : 6371*acos(cos(LatitudeA)*cos(LatitudeB)*cos(longitudeB -longitudeA)+sin(LatitudeA)*sin(latitudeB)) with coords in radian – grunk Apr 26 '11 at 13:17
  • It calculates for **all points** in the database whether they fall inside a circle _(hence the use of PI, sqrt and cos)_ and then selects the ZipCodes that belong to these points. – Johan Apr 26 '11 at 13:19
  • Not very efficient, because it calculates the distance for every entry in the database. It's better to create a bounding box first, and select based on the lat/long of the bounding box using a WHERE clause, then execute the formula only for the subset of rows that fall within the bounding box. – Mark Baker Apr 26 '11 at 13:26
  • Sorry, but I would have to see the benchmarks on this. MySQL can handle 43k records with this calculation and it wouldn't be that significant to worry about. I have a database with 80k records (covers all of the US) and it took .2630 seconds. How much more can you save? – Chuck Burgess Apr 26 '11 at 13:48
0

check out this great open source project

Disclaimer: Not my project, and nor am I contributor. Purely a recommendation.

Community
  • 1
  • 1
Mild Fuzz
  • 29,463
  • 31
  • 100
  • 148
0

See this answer to a previous question for an example of calculating a bounding box before querying MySQL. This allows the complex formula in the MySQL query to run against a subset of the database entries, rather than against every entry in the table.

Community
  • 1
  • 1
Mark Baker
  • 209,507
  • 32
  • 346
  • 385
  • There are approximately 43,000 zipcodes in the US. I would venture to say that all of the extra code to do something like this is overkill. While it is a great lesson on optimization, I'm not sure its entirely necessary, is it? – Chuck Burgess Apr 26 '11 at 13:31
  • All what extra code? The overhead in building a bounding box is miniscule. The reduction in using a bounding box to filter 1,000 records from a database down to 10 before performing a distance calculation is enormous.... even if a first pass with a bounding box doesn't return any records, so the bounding box need sextending two or three times, it's still vastly faster than calculating the distance for every single row, even on small datasets – Mark Baker Apr 26 '11 at 13:35
  • Can you show some benchmarks on this? I show in my comment how it takes fractions of a second to query every zipcode in the US. How much more efficiency do you need? – Chuck Burgess Apr 26 '11 at 13:54
  • No I can't cite any benchmarks.... and I'm not in a position to execute any tests at the moment, so I guess that proves me wrong because I can't prove that I'm right about this; but using a WHERE clause to limit a calculation to (for example) 10 rows (selected using a WHERE clause against an index) instead of 1000 rows __normally__ makes some difference to a SQL query. – Mark Baker Apr 26 '11 at 14:03
  • You are right! I am certain it will make a difference. However, the difference will NOT be noticeable to a human on such a small set of records. – Chuck Burgess Apr 26 '11 at 14:06
  • The difference may well come when you have multiple store records per zip code... OP is holding lat/long/zip in a stores table, so the select would be against that table, which may have 50/60 stores in every zip... a much more substantial dataset – Mark Baker Apr 26 '11 at 14:11