Metric/density based clustering/grouping

Question

I have a finite number of points (cloud), with a metric defined on them. I would like to find the maximum amount of clusters in this cloud such that:

1) the maximum distance between any two points in one cluster is smaller a given epsilon (const)

2) each cluster has exactly k (const) points in it

I looked at all kind of different clustering methods and clustering with a restriction on the inner maximum distance is not a problem (density based). The 2) constrain and the requirement to find "the maximum amount of clusters s.t." seem to be problematic though. Any suggestions for an efficient solution?

Thank you, A~

possible duplicate of [K-means algorithm variation with equal cluster size](http://stackoverflow.com/questions/5452576/k-means-algorithm-variation-with-equal-cluster-size) — Has QUIT--Anony-Mousse, Feb 17 '13 at 10:17

score 1 · Answer 1 · answered Feb 15 '13 at 22:37

Given your constraints, there might be no solution. And actually, that may happen quite often...

The most obvious case is when you don't have a multiple of k points.

But also if epsilon is set too low, there might be points that cannot be put into clusters anymore.

I think you need to rethink your requirements and problem, instead of looking for an algorithm to solve an unreasonably hard requirement that might not be satisfiable.

Also consider whether you really need to find the guaranteed maximum, or just a good solution.

There are some rather obvious approaches that will at least find a good approximation fast.

Could you specify the obvious approaches a bit? To me they are (unfortunately) not obvious. Also if the constraints are not met (no solution) the epsilon value would be increased and the query re-run (if more than k points). — aZen, Feb 16 '13 at 19:00

score 1 · Answer 2 · edited May 23 '17 at 11:49

I have the same impression as @Anony-Mousse, actually: you have not understood your problem and requirements yet.

If you want your cluster sizes to be k, there is no question of how many clusters you will get: it's obviously n /k. So you can try to use a k-means variant that produces clusters of the same size as e.g. described in this tutorial: Tutorial on same-size k-means and set the desired number of cluster to n/k.

Note that this is not a particular sensible or good clustering algorithm. It does something to satisfy the constraints, but the clusters are not really meaningful from a cluster analysis point of view. It's constraint satisfaction, but not cluster analysis.

In order to also satisfy your epsilon constraint, you can then start off with this initial solution (which probably is what @Anony-Mousse referred to as "obvious approaches") and try to perform the same kind of optimization-by-swapping-elements in order to satisfy the epsilon condition.

You may need a number of restarts, because there may be no solution.

Metric/density based clustering/grouping

2 Answers2