0

I would like to cluster geodata (coordinates, height at least) using density-based algorithm. I discovered DBSCAN should work pretty good for my purpose. I want to have even small separate clusters with minpts 1 or 2. It does the job, but leaves other points as one huge cluster or noise and I want those to be clustered into smaller groups also.

For example, if I have two groups of high points (like mountains) located in different places on the map, I want them to be in two separate clusters. How to achieve this? Maybe somehow to set max points value in the algorithm? I appreciate any advice.

P.S. I used R for this purpose, but the question is more about the approach.

Nata
  • 171
  • 3
  • 15
  • 1
    Could you provide some of the code that have you tried and a sample of your data? please see: [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Random Cotija Jun 29 '18 at 13:03

1 Answers1

1

Don't use too small minpts.

1 or 2 points are not "clusters". These points are "noise". Just treat all points in noise as separate clusters, or connect those within a short enough distance if you really want to. You can easily post-process noise.

But for the purpose of density base clustering, you need more points to have density. In fact for minpts up to 2, DBSCAN degenerates to single-link clustering.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Thank you for the advice. Could you explain how to remove connections between points that are located too far (more than some defined value)? – Nata Jul 03 '18 at 08:28
  • Define "connection". – Has QUIT--Anony-Mousse Jul 03 '18 at 17:59
  • I mean distances between points in one cluster. For, example, in the following plot I have one big in terms of distances cluster which I don't need. I can delete it, of course, and the points from it become a noise, and that's ok for me. But I don't understand why it appears. http://ipic.su/img/img7/fs/kiss_13kb.1530689835.png – Nata Jul 04 '18 at 07:40
  • You are probably plotting the *detected* noise as cluster? Check the documentation of the return value. – Has QUIT--Anony-Mousse Jul 04 '18 at 23:33