In short: I am using k-means clustering with correlation distance. How to check, how many clusters should be used, if any?
There are many indices and answers on how to establish a number of clusters when grouping data: example 1, example 2, etc. For now, I am using Dunn's index, but it is not sufficient due to one of the reasons described below.
All those approaches exhibit at least one of following problems, I have to avoid:
Indexes:
- clustering quality index derivation makes some assumptions regarding data covariance matrix, i.e. since such moment only euclidean or euclidean-like metrics apply - correlation one is not an option anymore
- it requires at least two nonempty clusters to compare already calculated partitions - there is no possibility to state whether there is any reason to make a division into groups at all
Clustering approaches:
- clustering approaches estimating number of clusters itself (e.g. affinity propagation) are much slower and do not scale well
To sum up: is there any criterion or index, which allows to check for existence of groups in data (maybe estimating number of them), without limitation on metric used?
EDIT: Space I am operating on has up to few thousands features.