0

I have a MySQL Routine that is getting records within a 50 mile radius when passed Latitude and Longitude via utilizing the Haversin equation.

While this works great, and is pretty speedy (considering it's searching through 82k records), I am thinking that I can get better performance by creating a similar procedure utilizing a POINT column.

So, in my table I created an extra column called Location, gave it a datatype of POINT, updated my data to pass lat & lon to the Location column. Data is valid, and is fine., and added a Spatial Index

The question is, how can I convert the following query to use the Location column, instead of lat and lon columns.

SET @LAT := '37.953';
SET @LON := '-105.688';

SELECT DISTINCT
BPZ.`store_id`,         
3956 * 2 * ASIN(SQRT(POWER(SIN((@LAT - abs(Z.`lat`)) * pi()/180 / 2),2) + COS(@LAT * pi()/180 ) * COS(abs(Z.`lat`) *  pi()/180) * POWER(SIN((@LON - Z.`lon`) *  pi()/180 / 2), 2))) as distance,
c.`name`,c.`address`,c.`city`,c.`state`,c.`phone`,c.`zip`,c.`premise_type`
FROM
`zip_codes` as Z, 
`brand_product_zip` as BPZ
LEFT JOIN `customers` c ON c.`store_id` = BPZ.`store_id`
WHERE
BPZ.`zip` = Z.`zip`
AND 
3956 * 2 * ASIN(SQRT(POWER(SIN((@LAT - abs(Z.`lat`)) * pi()/180 / 2),2) + COS(@LAT * pi()/180 ) * COS(abs(Z.`lat`) *  pi()/180) * POWER(SIN((@LON - Z.`lon`) *  pi()/180 / 2), 2))) <= 50
ORDER BY
distance LIMIT 20

I understand that this has been asked before, however, everything I see points to calculations based on lat and lon and not the POINT column

Updated Code:

SET @lat = 41.92;
SET @lon = -72.65;
SET @kmRange = 80.4672; -- = 50 Miles

SELECT *, (3956 * 2 * ASIN(SQRT(POWER(SIN((@lat - abs(`lat`)) * pi()/180 / 2),2) + COS(@lat * pi()/180 ) * COS(abs(`lat`) *  pi()/180) * POWER(SIN((lon - `lon`) *  pi()/180 / 2), 2)))) as distance
FROM    `zip_codes`
WHERE   MBRContains(LineString(Point(@lat + @kmRange / 111.1, @lon + @kmRange / (111.1 / COS(RADIANS(@lat)))), Point(@lat - @kmRange / 111.1, @lon - @kmRange / (111.1 / COS(RADIANS(@lat))))), `Location`)
Order By distance
LIMIT 20
Kevin
  • 2,684
  • 6
  • 35
  • 64
  • if you can geocode you data into a point, how about just going after say LIMIT 5 or the like on your query when you are going after some close points – Drew May 24 '13 at 12:48
  • really not sure what you are talking about and how it relates to my question. I need 20 records back from this, hence the `LIMIT 20` – Kevin May 24 '13 at 12:53
  • oh i wasnt even looking at your limit, sorry. so you have point data in your db. you create and index on it, and that is the focus of your where clause in the select (not lat and longitude) – Drew May 24 '13 at 12:56
  • that is what I would like. Right now, both the `where` clause and the `distance` column are calculated with haversin based on the passed `@LAT` and `@LON` – Kevin May 24 '13 at 12:59

2 Answers2

2

Have you looked into hilbert curves solutions? A spatial index doesn't deliver the exact solution? . With a mysql spatial index you can use mbrcontains:

CREATE TABLE lastcrawl (id INT NOT NULL PRIMARY KEY, pnt POINT NOT NULL) ENGINE=MyISAM;

INSERT
INTO    lastcrawl
VALUES  (1, POINT(40, -100));

SET @lat = 40;
SET @lon = -100;

SELECT  *
FROM    lastcrawl
WHERE   MBRContains
                (
                LineString
                        (
                        Point
                                 (
                                 @lat + 10 / 111.1,
                                 @lon + 10 / ( 111.1 / COS(RADIANS(@lat)))
                                 ),
                        Point    (
                                 @lat - 10 / 111.1,
                                 @lon - 10 / ( 111.1 / COS(RADIANS(@lat)))
                                 )
                        ),
                pnt
                );

Look here: MySQL - selecting near a spatial point. Here: http://www.drdobbs.com/database/space-filling-curves-in-geospatial-appli/184410998

Community
  • 1
  • 1
Micromega
  • 12,486
  • 7
  • 35
  • 72
  • Am I able to use that as a `distance` column as well? Also, what is the 111.1 for? Is that a potential km radius? – Kevin May 29 '13 at 17:55
  • 111km is roughly a degree longitude: http://stackoverflow.com/questions/1006654/fastest-way-to-find-distance-between-two-lat-long-points. Basically yes it's getting the distance and sorting it. – Micromega May 29 '13 at 18:05
  • Per that question, it's the 10 in there that's the distance factor. How can I factor this into a `distance` column? See the original question, as I am a bit confused how to apply this. – Kevin May 29 '13 at 18:11
  • Yes, I see. It's fetching a bounding box within 10 km square if you want exact distance you need apply the harvesine formula. It's cheaper to just the limit the bounding box. Why do you need exact solver? A spatial index is expensive. Look in my link with the hilbert curves. Hilbert curves are even more expensive. – Micromega May 29 '13 at 18:19
  • Btw. Instead of the harvesine formula you can also use a polygone with mysql? – Micromega May 29 '13 at 18:23
  • approx distance is fine in the `Where` clause. However, I do need an almost exact distance column. I am finding that this queery is much faster than it was before via utilizing what you posted above, and the haversine formula is proving pretty effective for getting my `distance` column, I was just wondering if I could apply the same principle... which I see I cannot, since it will only return a boolean response – Kevin May 29 '13 at 18:24
  • Not sure if I understood you. Of course you can get exact solver and fast speed. You need to create a spatial index and use mbrcontains and then the harvesine formula. It's left to you as exercise. – Micromega May 29 '13 at 18:28
  • yes, that is what I have done. I'll put the new query in an edit to my question. b.t.w. query runs in .007~ seconds ;) – Kevin May 29 '13 at 18:32
  • Unfortunately, I cannot award the bounty for another 17 hours... do you think this could get a bigger performance boost by putting the MBRContains into a function? And the Haversine column? – Kevin May 29 '13 at 18:43
  • Thank you. The problem with r-tree is that they can overlap and you need maybe 2 queries for a mbr. Other database uses quadtrees or hilbert curves. If you need a faster speed I would try my php class with hilbert curves. It uses a quadkey and a simple string search. Then you can put an index on that quadkey? Maybe it's faster then the r-tree thing? You can download my hilbert curve class at phpclasses.org? – Micromega May 29 '13 at 20:20
  • :) Thanks for all this. Class looks pretty interesting... wow it's been awhile since I was on that site :) – Kevin May 30 '13 at 13:12
  • A quadkey is easy enouh. Translate the geo coordinate into a binary and concatenate it and treat the result as a base-4 number. Check the ms bing tiles map system for a source code:http://msdn.microsoft.com/en-us/library/bb259689.aspx. – Micromega Sep 30 '14 at 18:41
1

The article Nearest-location finder for MySQL explains in detail various options, and the best choice for use with the Spatial Extensions starting with MySQL 5.6.

From the article, this sample query lists zip codes within a 50 mile radius from given coordinates (42.81, -70.81):

SELECT zip, primary_city,
       latitude, longitude, distance_in_mi
  FROM (
SELECT zip, primary_city, latitude, longitude,r,
       69.0 * DEGREES(ACOS(COS(RADIANS(latpoint))
                 * COS(RADIANS(latitude))
                 * COS(RADIANS(longpoint) - RADIANS(longitude))
                 + SIN(RADIANS(latpoint))
                 * SIN(RADIANS(latitude)))) AS distance_in_mi
 FROM zip
 JOIN (
        SELECT  42.81  AS latpoint,  -70.81 AS longpoint, 50.0 AS r
   ) AS p
 WHERE latitude
  BETWEEN latpoint  - (r / 69)
      AND latpoint  + (r / 69)
   AND longitude
  BETWEEN longpoint - (r / (69 * COS(RADIANS(latpoint))))
      AND longpoint + (r / (69 * COS(RADIANS(latpoint))))
  ) d
 WHERE distance_in_mi <= r
 ORDER BY distance_in_mi;
dgeske
  • 616
  • 4
  • 11