2

I have been using METIS for clustering social media users.

By default, it was outputting clusters with same number of vertices in each side, which is not ideal in real world scenario. So, I was trying to find way to loosen the constraint of "same number of vertices" and get possible imbalance partition with minimized cut value.

I find a parameter ufactor in the manual which is suitable(I think) for my case but I did not grasp what it is really doing. I have large graph and tried with some value of ufactor. For one data set ufactor=1000 works very well but for another dataset it could not even partition the graph. I can not interpret this result as i did not understand what it's really doing. Here is what i find in the manual about this:

Specifies the maximum allowed load imbalance among the partitions. A value of x indicates that the allowed load imbalance is (1 + x)/1000. The load imbalance for the jth constraint is defined to be max_i(w[j, i])/t[j, i]), where w[j, i] is the fraction of the overall weight of the jth constraint that is assigned to the ith partition and t[j, i] is the desired target weight of the jth constraint for the ith partition (i.e., that specified via -tpwgts). For -ptype=rb, the default value is 1 (i.e., load imbalance of 1.001) and for -ptype=kway, the default value is 30 (i.e., load imbalance of 1.03).

Can anybody help me to interpret this? Here, what is jth constraints? what is -ptype=rb/kway?

Tobias Ribizel
  • 5,331
  • 1
  • 18
  • 33
sovon
  • 877
  • 2
  • 12
  • 28

1 Answers1

2

First of all, let me mention that I think METIS is the wrong tool here, because it is used for graph partitioning, where the emphasis is on minimizing the number of edges between partitions while keeping the partitions balanced (more or less equal sizes)

What you probably want to do is community detection within social networks, i.e. the search for clusters which maximize internal connectivity (large number of edges between nodes from the same cluster) and minimize external connectivity (small number of edges between different clusters).
This can be achieved by maximizing the so-called Modularity of the clustering

There are several approaches to tackle this problem, a popular heuristic being Label propagation.
If you don't want to implement the algorithm yourself, I would recommend using a framework like NetworKit (unfortunately, I don't know any other such frameworks yet), which implements Label propagation, some modularity-based algorithms and many helpful tools.

But back to your original question:

What is -ptype=rb/kway?

There are multiple ways how you can approach the graph partitioning problem: You can either try to partition the graph into your desired number of partitions directly (k-way partitioning) or you can split the graph in half repeatedly until you have the desired number of partitions (recursive bisection, rb)

What is the jth constraint?

METIS allows you to try and optimize multiple balance constraints at the same time, i.e. if you have multiple types of calculations on the graph that should all be more or less balanced among the compute nodes.

See the manual:

Many important types of multi-phase and multi- physics computations require that multiple quantities be load balanced simultaneously. [...]
METIS includes partitioning routines that can be used to partition a graph in the presence of such multiple balancing constraints. Each vertex is now assigned a vector of m weights and the objective of the partitioning routines is to minimize the edge-cut subject to the constraints that each one of the m weights is equally distributed among the domains.

EDIT: Since you clarified that you wanted to look at a fixed number of clusters, I see how graph partitioning could be helpful here. Let me illustrate what ufactor means:

The imbalance of a partitioned graph is (in this simple case) computed as the maximum of the imbalance for each partition, which is roughly the quotient partition size / average partition size. So if we allow a maximum imbalance of 2, this means that the largest partition is twice as big as the average partition. Note however that ufactor doesn't specify the imbalance directly, it specifies how many permille away from 1 the imbalance is allowed to be. So ufactor=143 actually means that your maximal allowed imbalance is 1.143, which makes sense since your clusters are not that far from each other. So in your case, you will probably use larger values for ufactor to allow the groups to be of quite different sizes.

Consequences of large imbalance

If your imbalance is too large, it might happen that all the strongly-connected parts land in the same partition while only isolated nodes are put in the other partitions. This is due to the fact that the algorithm tries to minimize the number of cut edges between different partitions, which will be lower if we put all the high-degree nodes in the same partition.

What about spectral partitioning, ...?

The general approach of METIS works as follows:
Most input graphs are too large to partition directly, which is why so-called multilevel methods are used:

  • The graph is first coarsened (nodes are combined while trying to preserve the graph structure) until its size becomes feasible to partition directly
  • The coarsest graph is partitioned using an initial partitioning technique, where we could use a variety of approaches (combinatorial bisection, spectral bisection, exact solutions using ILPs, ...).
  • The graph is then uncoarsened, where in each step a small number of nodes are moved from partition to partition in a local search to improve the overall edge cut.

My personal recommendation

I should however note that while graph partitioning might be a valid model for your case, METIS itself might not be the ideal implementation for you:

As you can read on the METIS homepage, it is mostly used for rather sparse graphs ('finite element methods, linear programming, VLSI, and transportation'), whereas social networks are much denser and have a different structure (degrees follow a power-law distribution)

The coarsening approach of METIS uses heavy edge matching to combine nodes which are somehow close together, which works great for the intended applications, for social networks however, clustering-based coarsening techniques might prove more efficient.
Another library that is a bit slower in general, but implements some presets especially for social networks is KaHIP, see the manual for details.

(I should mention however that I am biased in this regard, since I worked extensively with this library ;-) )

Tobias Ribizel
  • 5,331
  • 1
  • 18
  • 33
  • [Here](https://drive.google.com/open?id=0B9cF34iSt6PAMXB2bXFsWHRLU1U) is "why I need to use Metis" and some questions for you. If you don't want to read the whole file just read the last section for my questions. Specially the last question is the most important for me. Please take a look. @Tobias Ribizel – sovon Jul 11 '17 at 15:30
  • @sovon I added some more information to my answer, however it might be helpful to know your problem sizes (how many nodes/edges does your graph have) to give any further recommendations – Tobias Ribizel Jul 11 '17 at 22:18
  • number of nodes are 320577 and edges 1209489. I removed the isolated components before-hand. Those nodes and edges constitute a single giant component. At ufactor=500, number of cut: 20986 and coverage: 0.9826488707214369. without ufactor, number of cut=21012 – sovon Jul 12 '17 at 07:40
  • Another thing is, If I do only spectral partitioning on my toy graph, it gave me expected partitioning. So, i guess, coarsening+spectral partition+refinement and un-coarsening would have been a good solution. But, I can not specify the partition algorithm in METIS – sovon Jul 12 '17 at 08:10
  • Got the idea about ufactor. see my question-answer [here](https://stackoverflow.com/questions/45041064/interpretation-of-ufactor-on-a-toy-graph-clustering/45063170#45063170) – sovon Jul 12 '17 at 16:31
  • As far as I can see, the only two possible initial partition algorithms (`-iptype=...`) are a greedy bisection approach (`grow`) and a random partition (`random`) of the coarsest graph. Just because spectral partitioning works well in your toy graph does not give you a lot of information on how well it will work in larger settings. Note that METIS itself claims that its partitions are consistently 10-50% better than those obtained using spectral techniques. This might also be a problem of perception ('What is a good result?'), but it would probably be best to move this discussion to a chat. – Tobias Ribizel Jul 18 '17 at 09:21