For a project that I am currently working on, I need to cluster a relatively large number of pairs of GPS into different location clusters. After reading many posts and suggestions here in StackOverflow and taking different approaches, I still have the problem of running it...
Dataset size: a little over 200 thousands pairs of GPS coordinates
[[108.67235 22.38068 ]
[110.579506 16.173908]
[111.34595 23.1978 ]
...
[118.50778 23.03158 ]
[118.79726 23.83771 ]
[123.088512 21.478443]]
Methods tried: 1. HDBSCAN package
coordinates = df5.values
print(coordinates)
clusterer = hdbscan.HDBSCAN(metric='haversine', min_cluster_size=15)
clusterer.fit(coordinates)
DBSCAN min_samples=15, metric= haversine, algorithm='ball_tree'
Taking the advice of Anony-Mousse, I have tried ELKI as well.
And all these methods gave me the same Memory Error
I have read these posts: DBSCAN for clustering of geographic location data Clustering 500,000 geospatial points in python
All these posts suggested that the size of a dataset should not be a problem. However, somehow I kept getting the error message. I am sorry if this winds to be a simple answer. Is it because of the settings? or simply because I am running it on my laptop with 16G memory...?