3

I have a database with Latitude and Longitude values. What I want to do is to cluster these data in order to get less results every time that I search the database. Any ideas how to implement this?

This is a similar question: Clustering Lat/Longs in a Database

Community
  • 1
  • 1
George Melidis
  • 599
  • 3
  • 9
  • 25
  • Are you saying you want to limit the result set to a radius of a provided lat/long or something? – Ray Aug 20 '14 at 15:34
  • @Ray Yes, given the values I want to group them in 3 clusters. For example if we have 100 points and 30 of them are different but close to a specific place, and the same with the other 30 and 40 values. We want 3 clusters which contain: Cluster1(30 values), Cluster2(30 values) Cluster3(40 values) OR Better would be if I could send a center point and take as a result 30 values that are very close to this point – George Melidis Aug 20 '14 at 15:49
  • Is the longitude and latitude in 2 columns? So basically, you want to be able to specify a point and return the 30 closest records (or 40 depending on your request)? – Ray Aug 20 '14 at 15:54
  • @Ray I stored them in two columns separately and in one column as a spatial point in order to do some testing. I want to try giving a specific point in one case and not in the other case so leave the procedure to recognize that this 30 points are very close so put them in one group. – George Melidis Aug 20 '14 at 16:08
  • @Ray So two cases: One case: Given the specific points group data based on that points. Second case: Group the data in clusters and choose randomly initial points – George Melidis Aug 20 '14 at 16:12
  • Consider loading your data into a tool better suited for clustering than MySQL. – Has QUIT--Anony-Mousse Aug 20 '14 at 22:30

1 Answers1

2

It's not too crazy to find the closest records to a given longitude/latitude. Assuming you've a table location with columns longitude and latitude you could do the following (substitute out the and with the values from your point.

   SELECT id, latitude, longitude,  
               ROUND(6353 * 2 * ASIN(SQRT(POWER(SIN((<point_latitude> - 
                      abs(latitude)) * pi()/180 / 2),2) + COS( <point_latitude>  * pi()/180 ) 
                      * COS( abs(latitude) *  pi()/180) 
                      * POWER(SIN(( <point_longitude> - longitude) 
                      *  pi()/180 / 2), 2) )), 2) AS distance
     FROM location
     ORDER BY distance ASC 
     LIMIT 30;

You could do similar with spatial functions.

Ray
  • 40,256
  • 21
  • 101
  • 138
  • Ok I understand but what in case of not giving a specific point? I found that kmeans can do this but I don't know how to implement. – George Melidis Aug 20 '14 at 16:30
  • @GeorgeMelidis Not sure how you'd implement kmeans clustering, but check out this http://jonisalonen.com/2012/k-means-clustering-in-mysql/ It looks like a stored proc is needed – Ray Aug 20 '14 at 16:33
  • Thanks Ray I have seen this article. It's the only one I found :-). I'm going to try it and if I have good results in my data I post! – George Melidis Aug 20 '14 at 16:38