1

I have a set of 400k geographical points (with Latitude and Longitude) and I am trying to cluster it and plot it on a map. Currently I am using MarkerCluster of Folium package to visualise the clustering of points. But this seems to be very slow and the code keeps running indefinitely.

Just wondering whether there is any other Python package that can be used efficiently for this purpose?

Current code:

import folium
from folium import plugins
from IPython.display import Image, clear_output, display, HTML

data = df[['StartLat','StartLong']].as_matrix()
avgLat = df['StartLat'].mean()
avgLong = df['StartLong'].mean()

mapa = folium.Map([avgLat, avgLong], zoom_start=6)
marker_cluster = folium.MarkerCluster().add_to(mapa)
latArr = np.array(df.StartLat)
lonArr = np.array(df.StartLong)

for i in range(len(latArr)):
        folium.Marker([latArr[i], lonArr[i]], icon = folium.Icon(color='green',icon='ok-sign')).add_to(marker_cluster)
mapa.save('Clustering.html')
user3447653
  • 3,968
  • 12
  • 58
  • 100

1 Answers1

1

Let me try to answer your question in 2 steps:

  1. Have you seen the question here ? They also have the same problem of clustering a large number of geographic co-ordinates. The solutions suggested use the clustering algorithms from scipy.cluster

  2. However, for geographic lat-long. Normal cluster analysis techniques might not be well suited. This is primarily because point samples taken from the surface of the earth tend to be correlated to each other (spatial autocorrelation). Hence, the points violate the Independence clause inherent in many techniques in classical statistics. Hence, if you are sticking to Python, I would recommend looking at the package clusterPy (link here). They have several implementations of cluster algorithms that are commonly used on spatial data. Some reading up on spatial autocorrelation may also be helpful to understand the considerations (such as distance bands) often required as parameters by some of the algorithms.

Community
  • 1
  • 1
DotPi
  • 3,977
  • 6
  • 33
  • 53
  • Spatial Autocorrelation is a different thing. You need measurements (say temperature); and then these measurements tend to be correlated for nearby locations. This in particular applies if you have a regular grid of measurements... But if he only has coordinates, clustering such as OPTICS or DBSCAN can work just fine. The notion of point density does apply. – Has QUIT--Anony-Mousse Nov 22 '16 at 21:32
  • Yes. The use of spatial clustering techniques should only be used if the points represents values/measurements of something on the ground. Otherwise if there are only lat/long values representing locations, normal clustering techniques work fine. – DotPi Nov 23 '16 at 16:48