3

given one point A, get the nearest 100 points from one million data records

  • database is MySql
  • one million records of latitude and longitude
  • these points mean user's current position when logined , so they may be changing.

scenario:

when a user open a page , display the nearest top 100 other people.

looping
  • 1,141
  • 3
  • 11
  • 20
  • How are `latitude` and `longitude` stored? – Barranka Aug 25 '14 at 01:51
  • 2
    Have you searched for spacial data types? See e.g. http://stackoverflow.com/questions/2096385/formulas-to-calculate-geo-proximity – Morten Aug 25 '14 at 01:54
  • @Barranka eg, lng 38.619752 lat 59.765115 – looping Aug 25 '14 at 01:56
  • If I understand correctly, you are storing the `latitude` and `longitude` in two separate (`double` or `decimal`) fields. Am I right? – Barranka Aug 25 '14 at 01:58
  • 3
    are you using mysql [extensions for spatial data](http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html) or just storing them in a plain table? – pqnet Aug 25 '14 at 02:00
  • I found [this answer at gis.stackexchange.com](http://gis.stackexchange.com/questions/27878/how-to-find-20-closest-points-efficiently), it may help you. (I agree with both Morten and pqnet, spatial data may be the way to go) – Barranka Aug 25 '14 at 02:01
  • Another useful link: ["Geo (proximity) search with MySQL" (pdf), by Alexander Rubin, Senior consultant, MySQL AB](http://www.notaires.fr/sites/default/files/geo_searchjkkjkj_0.pdf) – Barranka Aug 25 '14 at 02:06
  • The best you can probably do is use a spatial index and assume the points are evenly distributed. This assumption will allow you to compute an initial bounding box that statistically should enclose the 100 desired points. If it yields fewer, double the bounding box size and search again. Sort on geo distance (you'll need the standard spherical trig formula) and take the top 100. – Gene Aug 25 '14 at 02:23
  • 1
    If the data is changing do you really need the 100 closest points or just 100 close points? That is if there are 200 people within 600 meters does it really matter which 100 you return? If there are 50 people within 600m and another 60 people between 600m and 1200m does it really matter which 50 of the 60 you return? Is the use social, or life and death? – ben rudgers Aug 25 '14 at 03:04

2 Answers2

2
  1. Setup spatial extensions for your database, if you have not done that already.
  2. Store latitude/longitude of your 1M locations in a geography-type column in the database.
  3. Create a spatial index on that column.
  4. Run a SELECT query with a WHERE clause based on distance between your point of interest and locations in the table. The query will utilize the above index.

Here is a good article on using spatial extension in MySQL 5.6 for exactly this sort of things.

Michael Diomin
  • 530
  • 3
  • 13
0

http://en.wikipedia.org/wiki/Geohash might be a quick way to speed up the average case, but worst case behaviour would still be bad. The article suggests that you index by geohash and, on a query, retrieve all points in a bounding box that amounts to a geohash prefix. If the bounding box is small and you find a match in it closer than any point outside the bounding box then you have succeeded quickly, but neither of these things might be true.

mcdowella
  • 19,301
  • 2
  • 19
  • 25