5

I have two questions to be precise. Firstly, I would like to know if there is an easy way to adapt the Markov Clustering Algorithm so that I can specify in advance, how many clusters I would like to have at the end. If not, which similiar algorithm would you recommend?

And secondly how should be dealt with overlapping clusters in the Markov world?

user2560216
  • 63
  • 1
  • 5

1 Answers1

14

1). There is no easy way to adapt the MCL algorithm (note: its name is 'Markov cluster algorithm' without the 'ing'. Many people verbalise it as in 'doing Markov clustering', which is fine) to output a specified number of clusters. This is in my opinion, for 99.99% of the time a highly desirable feature. If I were to do what you want, I would generate 4 or 5 clusterings at different levels of granularity (say setting the MCL inflation parameter to 1.4, 2.0, 3.0, 4.0 and 6.0, but it could be worthwhile to do a few more and pick based on the distribution of cluster sizes), then unify them in a hierarchical clustering (the program 'clm close' can do that). After that one could traverse the tree and try to find an optimal clustering of the desired size. This obviously requires significant effort. I have done something similar but not quite the same in the past.

2). Overlapping clusterings produced by MCL are extremely rare, and always a result of symmetry in the input graph. The standard MCL implementation that most people use (from http://micans.org/mcl/) will remove overlap. This in my opinion is not a concern. Disclaimer: I authored MCL.

micans
  • 1,106
  • 7
  • 16
  • well that's actually a good idea. using different inflation values is kind of try and error but doable. thanks. – user2560216 Jul 23 '13 at 10:16
  • The current development mcl has a new option where an input clustering is specified: it will construct a subgraph on that clustering (by removing inter-cluster edges) and proceed with clustering. This could conceivably be useful. Another point: have you tried methods that allow the number of clusters to be specified, e.g. graph partitioning by spectral methods (I believe hmetis is such a method) or spectral clustering? (and there must be many other such methods). – micans Jul 23 '13 at 13:21
  • @micans, i'm new to MCL and just went through these slides: http://www.cs.ucsb.edu/~xyan/classes/CS595D-2009winter/MCL_Presentation2.pdf, where it mentions about the `power parameter e` that controls the expansion operation. I don't see this parameter in the official MCL manual: http://micans.org/mcl/man/mcl.html#options. Is it implicitly set somewhere, if not, is there a guideline for choosing a value for it? – MLister Jul 30 '13 at 00:18
  • 1
    @MLister This parameter is used in the abstract formulation of the Markov Cluster process, but in practice it is always set to two (implying that 2-step random walks are computed). Taking other (higher) values is computationally very expensive and of little use. Higher values will lower cluster granularity, but this granularity can also be lowered by decreasing inflation. – micans Aug 06 '13 at 11:14