15

I have an undirected planar graph where each node has a weight. I want to split the graph into as many connected disjoint subgraphs as possible (EDIT: or to reach a minimum mean weight of the subgraphs possible), given the condition that each subgraph has to reach a fixed minimum weight (which is a sum of weights of its nodes). A subgraph containing only a single node is OK as well (if the node's weight is larger than the fixed minimum).

What I have found out so far is a heuristic:

create a subgraph out of every node
while there is an underweight subgraph:
  select the subgraph S with the lowest weight
  find a subgraph N that has the lowest weight among the neighbouring subgraphs of S
  merge S to N

Clearly this is not optimal. Has anyone got a better solution? (Maybe I'm just ignorant and this is not a complex issue, but I have never studied graph theory...)

EDIT (more background details): The nodes in this graph are low-scale administrative units for which statistical data are to be provided. However, the units need to have a certain minimum population size to avoid conflicts with personal data legislation. My objective is to create aggregates so that as little information as possible is lost on the way. The neighbourhood relationships serve as graph edges, as the resulting units must be contiguous.

Most of the units (nodes) in the set are well above the minimum threshold. About 5-10 % of them is below the threshold with varying sizes, as seen on the example (minimum size 50):

Example situation

Jan Šimbera
  • 457
  • 3
  • 16
  • 2
    Without the planarity condition, this problem certainly is strongly NP-hard. I don't expect an easy way to exploit planarity other than possibly a dynamic program whose running is exponential in √n (as opposed to n, using the fact that planar graphs have treewidth O(√n)). – David Eisenstat Dec 08 '14 at 21:04
  • 1
    How big is the graph of interest? How important is it to get the optimal solution? – David Eisenstat Dec 08 '14 at 21:41
  • 1
    The maximum number is in tens of thousands of nodes, with appx. 5 edges per node, on average (it is a neighbourhood graph of an administrative division map). Running time is not the main concern, but the solution should be optimal. – Jan Šimbera Dec 08 '14 at 21:48
  • 1
    I would add the largest subgraphs that bring it to exactly the minimum - the smaller ones are more likely to be useful for padding later. – Mike Dec 10 '14 at 22:48
  • 2
    Have you implemented the heuristic? How many subgraphs does it create? What's the total weight divided by the threshold (a crude upper bound)? – David Eisenstat Dec 11 '14 at 15:47
  • As David says, to think up reasonable approximation schemes, we need as much information as you can possibly give. – Thomas Ahle Dec 11 '14 at 23:17
  • Added some example data. I really prefer a high-quality solution to any improvements in running time (though an exponential algorithm is unacceptable). – Jan Šimbera Dec 15 '14 at 10:14

3 Answers3

5

This is an NP-hard optimization problem. For example, the Partition problem can be reduced into this easily (the planarity property does not cause a problem). So an algorithm that calculates an optimal solution (and you seem to ask for an optimal solution in your comment) is unlikely to be practical for "tens of thousands of nodes".

If you don't actually need an optimal solution but a good one, I would use local optimization methods, such as tabu search or simulated annealing.

Because the mean weight of your subgraphs is just the weight of the total graph divided by the number of subgraphs, the only thing that matters is to find the maximum number of subgraphs you can attain. Guess this number, N, form an initial partitioning into N subgraphs, and then, for example, use local moves of (1) moving a node from one subgraph to another and (2) exchanging two nodes between two adjacent subgraphs, in search for an acceptable solution where every subgraph has the required minimum weight. If you can't find an acceptable solution at all, decrease N (e.g. by one) and restart until you find a solution.

Antti Huima
  • 25,136
  • 3
  • 52
  • 71
  • Thanks a lot. I was initially searching for an algorithm giving the optimal solution, but your solution seems like the best choice I found after finding out it's NP-hard. – Jan Šimbera Dec 18 '14 at 15:00
2

The details of the instance change everything.

Call a vertex heavy if it's over threshold by itself. It never makes sense to have a subgraph with two heavy vertices, because we could split it in two. Thus, the number of subgraphs in the optimal solution is related additively to the number of subgraphs containing no heavy vertex.

As a result, we can delete the heavy vertices and focus on making as many valid subgraphs as possible out of what's left. Judging by your map, what's left will consist of many connected components, each with only a handful of vertices. Valid subgraphs must be connected, so these components can be solved independently. On small instances, one can (e.g.) enumerate all valid subgraphs and then run Algorithm X to find a maximum-cardinality packing.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
1

The problem is NP-hard then I will highlight an exponential solution. This solution Have many improve points, I will highlight somes.

The whole idea is: each partition of vertex are connected by some edges then you can ensure that if you try with all the possible sets of edges that makes a correct partition of the graph. You can find the best case counting the number of sets of each partition (optimal condition).

In your previous approach you don't have Domain to expand the search. For the solution were used the following: - Disjoint Sets: Partition representation - Power Sets: For find all the possible edges sets

public Partition Solve(Graph g, int min)
{
    int max = 0;
    Partition best;

    // Find all the possible partitions for Edges
    foreach(var S in PowerSet(g.Edges))
    {
        // Build the Vertexes partition
        var partition =  BuildPartition(S);

        // Test the min condition foreach component
        if (IsInvalid(partition, min))
            continue;

        // Continue if we already have something better
        if (max >= partition.Length)
            continue;

        // Update
        max = partition.Length;
        best = partition;
    }

    return best;
}

public Partition BuildPartition(Graph g, IEnumerable<Edge> edges)
{
    // Initially Every Vertex in alone in his partition
    var partition = new DisjointSet(g.Vertexes);

    foreach (var edge in edges)
    {
        // If the Vertexes of this edge are already in the same partition DO NOTHING
        if (partition.Find(edge.V1) == partition.Find(edge.V2))
            continue;

        // Join both subsets
        partition.Union(edge.V1, edge.V2);
    }

    return parition;
}

public bool IsInvalid(Partition p, int min)
{
    return p.Sets.Any(t => t.Sum(v => v.Weight) < min);
}

You can improve the solution in the following aspects: - Add parallelism to the PowerSet and IsInvalid condition - Find a better way of generate valid Edge sets - Have some starting case for Vertex with more weight than the minimun (Always will be in a separate subgraph)

The order of the algorithm is given by the Power Set. - Power Set: in this case for N vertex graph you will have in the worst case 3N-6 edges then O(2^N). - Build Partition: V + E*LogV then is O(NLogN) - IsInvalid : O(V)

Finally the Solve is O(2^N * N * LogN) Use this last formula to calculate the number of operations

Hope this Help!

Miguel
  • 3,786
  • 2
  • 19
  • 32
  • Thanks! However, for a 10000-vertex graph the number of checks needed is appx. `10^3000`, which is not really sustainable... – Jan Šimbera Dec 18 '14 at 14:56
  • You are triying to solve a NP problem!!! Almost nothing will work. Using this you can fine the optimum and for sure any other algorithm will spent the same. In the other hand you need to be trust and try with meta heuristic or evolutionary algorithm to have a partial solution. – Miguel Dec 18 '14 at 16:16
  • I'm now certain that an algorithm is not feasible. What I need is a heuristic that can approximate a very good solution while being polynomial in time... – Jan Šimbera Dec 19 '14 at 16:59