3

I run a geolocation-based social network. Members can see other members based on how close they are.

Right now my MySQL query looks like:

$lat_min = $geo['user_lat'] - 5;
$lat_max = $geo['user_lat'] + 5;
$long_min = $geo['user_long'] - 5;
$long_max = $geo['user_long'] + 5;    

$members_query = "SELECT " . $fields . " FROM members WHERE (user_lat BETWEEN " . $lat_min . " AND " . $lat_max . " AND user_long BETWEEN " . $long_min . " AND " . $long_max . ") OR (gc_lat BETWEEN " . $lat_min . " AND " . $lat_max . " AND gc_long BETWEEN " . $long_min . " AND " . $long_max . ")";

user_lat and user_long are coordinates based on geolocation if they have it turned on in their browser. gc_lat and gc_long and coordinates based on their IP address. These rows are all indexed in the database. I am pulling all members within 10 degrees.

The problem is this query takes about 2 seconds to perform for our 250,000+ members, and we want the site to scale.

ATTEMPT 2: I have tried assigning quadrants to each member e.g. "36x-99" ... I am rounding off the latitude and longitude to the nearest multiple of 3 to label the quadrant, and then I only pull quadrants within 12 degrees of the quadrant the member is in.

$members_query = "SELECT " . $fields . " FROM members WHERE quadrant IN ('36x-99', '33x-99', '30x-99', ...);

This gave me no noticeable difference in query speed.

Anyone have any ideas of what I should do? I need to find a solution that will allow the site to scale better.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Ben Buras
  • 51
  • 1
  • 2
  • 12
  • possible duplicate of [Fastest Way to Find Distance Between Two Lat/Long Points](http://stackoverflow.com/questions/1006654/fastest-way-to-find-distance-between-two-lat-long-points) – Giacomo1968 Jan 01 '14 at 05:35

1 Answers1

7

The problem is the way you are storing data in the database is not suited for the type of task you are performing. Using Point values in Geometry data points is the way to go. Actually coded something 4+ years back for this purpose, but having issues finding it. But this post seems to cover it well.

EDIT Okay, found my old code, but it’s referring to old client data that I obviously cannot share. But the key to speed with coordinates in databases is using POINT data stored in the database table with the type of GEOMETRY. More details here on the official MySQL site. Since I have needed a reason to revisit this type of code—and the concepts—for a while here is a quick MySQL script I whipped up to create a sample table with sample data to convey the basic concepts. Once you understand what is happening, it opens up lots of cool options.

Also found this great/simple explanation of the concept as well.

And found another great assessment of spatial data in MySQL 5.6. Lots of great info on indexes & performance. Specifically regarding MySQL spatial index performance:

MyISAM tables support Spatial indexes, so the above queries will use those indexes.

And on the other side of that:

The InnoDB engine does not support spatial indexes, so those queries will be slow.

And here is my basic MySQL testing scripts to help illustrate the concept:

/* Create the database `spatial_test` */
CREATE DATABASE `spatial_test` CHARACTER SET utf8 COLLATE utf8_general_ci;

/* Create the table `locations` in `spatial_test` */
CREATE TABLE `spatial_test`.`locations` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `coordinates` point NOT NULL,
  UNIQUE KEY `id` (`id`),
  SPATIAL KEY `idx_coordinates` (`coordinates`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

/* Insert some test data into it. */
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(27.174961 78.041822)'));
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(27.985818 86.923596)'));
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(44.427963 -110.588455)'));
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(19.896766 -155.582782)'));
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(40.748328 -73.985560)'));
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(40.782710 -73.965310)'));

/* A sample SELECT query that extracts the 'latitude' & 'longitude' */
SELECT x(`spatial_test`.`locations`.`coordinates`) AS latitude, y(`spatial_test`.`locations`.`coordinates`) AS longitude FROM `spatial_test`.`locations`;

/* Another sample SELECT query calculates distance of all items in database based on GLength using another set of coordinates. */
SELECT GLength(LineStringFromWKB(LineString(GeomFromText(astext(PointFromWKB(`spatial_test`.`locations`.`coordinates`))), GeomFromText(astext(PointFromWKB(POINT(40.782710,-73.965310))))))) AS distance
FROM `spatial_test`.`locations`
;

/* Yet another sample SELECT query that selects items by using the Earth’s radius. The 'HAVING distance < 100' equates to a distance of less than 100 miles or kilometers based on what you set the query for. */
/* Earth’s diameter in kilometers: 6371 */
/* Earth’s diameter in miles: 3959 */
SELECT id, (3959 * acos(cos(radians(40.782710)) * cos(radians(x(`spatial_test`.`locations`.`coordinates`))) * cos(radians(y(`spatial_test`.`locations`.`coordinates`)) - radians(-73.965310)) + sin(radians(40.782710)) * sin(radians(x(`spatial_test`.`locations`.`coordinates`))))) AS distance 
FROM `spatial_test`.`locations`
HAVING distance < 100
ORDER BY id
;
Community
  • 1
  • 1
Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
  • 1
    OK, I will look into this. – Ben Buras Jan 01 '14 at 05:43
  • 1
    Seems like you are on the right track here. I will have to figure this out tomorrow. – Ben Buras Jan 01 '14 at 06:03
  • Thanks! That helps immensely. I'm trying it now. – Ben Buras Jan 01 '14 at 15:47
  • The only thing I'm wondering is will it calculate the distance around the earth, or is it cutting through the earth? – Ben Buras Jan 01 '14 at 15:53
  • I guess it doesn't matter if it's cutting through the earth as long as it's quick. I can use the Haversine formula in the PHP, like I am currently doing, to calculate the actual distance. – Ben Buras Jan 01 '14 at 16:09
  • 2
    “The only thing I'm wondering is will it calculate the distance around the earth, or is it cutting through the earth?” Very good question. This question seems to explain it well. http://stackoverflow.com/questions/574691/mysql-great-circle-distance-haversine-formula – Giacomo1968 Jan 01 '14 at 16:51
  • 1
    I got my coordinates in the database with this: http://stackoverflow.com/questions/8954417/mysql-update-geo-point – Ben Buras Jan 01 '14 at 17:21
  • 1
    Great! I have just edited my answer to add an Earth diameter based query as well. Thanks for helping me care about this stuff again! The query is based on that one post I liked to above as well as Google Maps API documentation here: https://developers.google.com/maps/articles/phpsqlsearch_v3 – Giacomo1968 Jan 01 '14 at 17:28
  • It looks like you have one too many closing parens on the Earth diameter based query. It's also returning me negative values for distance. – Ben Buras Jan 01 '14 at 18:07
  • @BenBuras ??? No idea what issues you are having. When I create the exact test database as outlined & then run the queries exactly as I have them—even copied & pasted them just now—they work fine. For all intents & purposes, I am done! I hope my answer & the details/examples provided are able to help you and others. Cheers! – Giacomo1968 Jan 01 '14 at 18:11
  • 1
    @BenBuras Thanks! So I assume that it is all working out? I’d also suggest reading this article on the new additions in MySQL 5.6. Two very key performance hints in it, “MyISAM tables support Spatial indexes, so the above queries will use those indexes.” And “The InnoDB engine does not support spatial indexes, so those queries will be slow.” http://www.mysqlperformanceblog.com/2013/10/21/using-the-new-spatial-functions-in-mysql-5-6-for-geo-enabled-applications/ – Giacomo1968 Jan 01 '14 at 22:11
  • 1
    Yes, it is working now! I did convert the table to MyISAM and put a spatial index on the coordinates. Now this query is no longer coming up in my slow queries log. Thanks so much for your help! – Ben Buras Jan 01 '14 at 22:47
  • @BenBuras Also, just found an error in my sample `CREATE TABLE` where it said `coordinates geometry NULL,` that should actually use point instead and be `NOT NULL` as this: `coordinates point NOT NULL,`. When you do that you can set a spatial index like this: `SPATIAL KEY idx_coordinates (coordinates)`. Your queries will stay the same but this new structure is even faster. – Giacomo1968 Jan 02 '14 at 00:40