I am looking for an algorithm that solves the following problem:
- Given: a set of items and their similarity matrix.
- Goal: group these items in "clusters" of minimum size m
- Conditions:
- There are no cluster-like structures in the dataset, as shown in Figure 1
- Anyway, the items in a group should be similar to each other. Thus, the global similarity would be high.
The motivation is not to identify good clusters but to split a dataset into groups of high similarity and of minimum size. Partitioning around medoids does not work out-of-the box, it would produce a lot of 1-item-clusters. Hierarchical approaches (AGNES, DIANA) does not help either.
This problem is someway similar to Stable Marriage problems: one tries to rank the neighbored items by similarity. But here, there are at least m items in one group / one marriage.
Thanks in advance!