Comparing clustering results in Carrot2

Question

I am using Carrot2 tool for my thesis and running different algorithms on this tool. My question is that, how can i compare the results of different algorithms scientifically? I mean, i need a proof of that the clustering results of algorithm 1 is better than results of algorithm 2. Do entropy and purity values work for me, if so, how can i apply them?

Thanks.

Phil · Accepted Answer · 2012-05-08T00:49:50.743

The best way to compare two algorithms is in my opinion to show their performance on some real data and explain why they work well or don't work well in some special cases (e.g. it works well on dense data or sparse data, or data with variable density...). In some case you may be able to make a theoretical proof that some algorithm has some additional desirable properties compared to another algorithm. But it may be hard to do.

Also for determining if a result is good, you may need a domain expert to tell you if the clusters make sense for your application domain.

I mean that measures like entropy and purity are interesting measures. But ultimately a data mining techniques is appropriate for a particular domain only if it generate meaningful results for that domain.

If you are developping a general clustering algorithms than you may perhaps use these measures to show that your algorithm has better properties than another algorithm under certain conditions and use these measures to argue about that. But you will still need to illustrates with some real data why it works better in some case.

score 1 · Answer 2 · answered May 07 '12 at 18:01

Comparing clustering results is unfortunately not trivial. In particular when it comes to overlapping, hierarchical and subspace results. The common measures only work work strict partitioning clusterings. And even then the have different bias and there exists a dozen of quality measures. So your result may well be better on one measure, and worse on the other.

I don't know the details on Carrot, as I am an ELKI user. For comparing clusterings it has amongst others various pair counting measures (Precision, Recall, F1, Jaccard, Rand, Adjusted Rand, Fowlkes-Mallows), entropy based measures ("normalized mutual information"), Bcubed measures (again precision, recall and F1), Set-matching measures (F1, purity and inverse purity), edit-distance based measures and a Gini-based measure. That is a total of like 20 quality measures. I have not yet found a good argument why one or the other is clearly superior, they all have their benefits and drawbacks. BCubed claims to be superior, but well, who doesn't?

https://en.wikipedia.org/wiki/Cluster_analysis#External_evaluation gives details on some of these measures, but also no indication of when to use which.

Plus, experiments cannot prove that any algorithm is better than another. You might just have chosen bad parameters for the other! Or you might be using an inappropriate "algorithm 2". There are hundreds of clustering algorithms (ELKI is the largest collection of clustering algorithms I know, which is why I'm currently working with it!) and ideally you should be better than every single one of them. I currently do not think it makes much sense to invent yet another clustering algorithm. You might just be reinventing the wheel, and someone might already have invented exactly this clustering algorithm, or something even superior.

score 1 · Answer 3 · answered May 08 '12 at 14:01

As other have mentioned there is no "best" quality metric simply because there is no "best" criterion of quality when you talk about unsupervised clustering. Some people (and applications) prefer small, compact clusters, others will tend to favor large, high-level clusters. Some go for hierarchical, others for flat (partitioning) results. Some will like crisp assignments, others will prefer fuzzy membership functions... this can go forever.

Similar to reasons above, there is no "perfect" ground truth set for performing such comparisons. It all kind of depends on what the input data is, what the objective is, etc.

See Carrot2 publication list at http://project.carrot2.org/publications.html, some of these publications include quality metrics and data sets you can reuse (minding my comments above). This one is probably best applicable to clustering search results:

Claudio Carpineto, Stanislaw Osiński, Giovanni Romano, Dawid Weiss: A survey of Web clustering engines. ACM Computing Surveys (CSUR), Volume 41, Issue 3 (July 2009), Article No. 17, ISSN:0360-0300

And, of course, we would welcome your contributions back to Carrot2 if you come up with an interesting new algorithm!

Comparing clustering results in Carrot2

3 Answers3