4

Given a set of geolocations (in the form latitude/longitude), how would I go about finding the most popular location?

As an example, I've compiled a map containing a variety of points:

Map Link

We can see that - besides 1, 4, 6 and 9 - all of the points are grouped at approximately one location. How could I calculate the mean location of this group? Ideally, as the map becomes more populated, I'd like to calculate the 2nd most popular location, 3rd most popular location etc.

How could I tackle this?

Thanks in advance.

mjv
  • 73,152
  • 14
  • 113
  • 156
cgsh
  • 55
  • 4
  • 3
    A kmeans algorithm springs to mind – Mark Baker Jan 28 '13 at 20:06
  • I would suggest QT clustering instead, or any other one where you don't know the number of clusters you are after. @Shep, you are after clustering algorithms. Read about them (QT, kmeans, fuzzy c-means, and others) and check what might be better for what you expect. As it stands, this question is not a good fit since there will be divergent opinions since you haven't tried any form of clustering yet. – mmgp Jan 28 '13 at 20:19
  • In particular, check out hierarchical clustering algorithms. The core idea of hierarchical clustering is that objects are more related to nearby objects than to objects farther away, which seems like what you want in this case. The difficulty with clustering is that, depending on the dataset, the clustering you want to do can change. In your map, something 200 meters away might be in a cluster, but in a map of Europe, something 200km away might be considered close enough to cluster. I recommend experimentation. – Weaver Jan 28 '13 at 20:25

2 Answers2

5

If you need a simple solution that will give you a good estimate and is easy to code...

  • Pick a radius for what you consider the size of a location (we'll say 100m)
  • Iterate through your locations making them the center of your circle
  • Count how many other points are within the circle using Point-in-Circle
  • The location with the most points within it's circle becomes your most popular, now remove all of those points from your set and repeat for 2nd most popular to nth most popular.

How to convert distance to latitude & longitude? transform longitude latitude into meters

For example, 1 degree of latitude is between 110574 and 111693 meters.

Community
  • 1
  • 1
Louis Ricci
  • 20,804
  • 5
  • 48
  • 62
2

The DBSCAN algorithm is probably what you are looking for.

It's an algorithm that finds clusters of points based on their density. Since in your case popular means dense, you can use it to solve this task. It requires two parameters:

  • eps, the radius around each point within which to look for other points to form a cluster,
  • minPts, the required number of points within the radius around a point to consider the group of nodes as a cluster.

You also need a function to measure the distance between points. Since you have (latitude, longitude) couples (generally in WGS84 format), the Haversize distance is what you need.

There are several implementations of the algorithm. If you are using Java, Apache Commons Math provides a decent implementation (see here for more details and some code snippet). Invoke the DBSCANClusterer with eps=1.0 (radius of 1 km) and minPts=0 (clusters have 1 or more points). Refer to this answer for implementing the Haversine distance (be sure to match the same measurement unit used for eps). Finally sort the clusters by decreasing size to have them sorted by "popularity":

Collections.sort(clusters, (o1, o2) -> 
    Integer.compare(o2.getSize(), o1.getSize());
Cluster<? extends Clusterable> mostPopular = clusters.get(0);

If I remember correctly, this implementation solves the problem in quadratic time with respect to its size (the number of points). If all the instances of the problem that you will face has the same size of your example, you will have no problems.

Community
  • 1
  • 1
Stefano Bragaglia
  • 622
  • 1
  • 8
  • 25