2

The problem is given as follows: Given a DAG and a number 0 < p ≤ 1, return the minimum-cardinality set of vertices that disconnects at least a p-fraction of paths from a source (i.e., no incoming arcs) to a sink (i.e., no outgoing arcs). For p = 1, the problem is equivalent to minimum cut. For other values of p, however, I'm not sure what the answer will be.

An algorithm I'm thinking of is to first compute the min-cut set for a DAG and then try to prune it to satisfy the criteria. This by itself is interesting to see if the subset that we find is actually the minimum cut-set for the specific p that is given. The problem with this algorithm is that it is wasteful, because it computes many nodes that we don't need in the final answer and in fact, is solving "bigger" problem in the first place.

Any pointers to the solution of this problem? wouldn't it be possible that for some of the algorithms of min-cut, we put this extra constraint as an early-stopping criterion?

For checking how many paths are removed, suppose we have indexed each vertex (and will keep it updated if needed) so that we know how many paths get disconnected by their removal. Please do not worry for the complexity of the index being updated. One last thing, there is no constraint on the resulting components in terms of size or anything.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
Dant3
  • 41
  • 4
  • 2
    s-t paths for some given designated vertices s and t? or any paths? – Niklas B. Sep 21 '15 at 11:04
  • How does a set of vertices disconnect anything? Do you mean a bipartition of the vertices, with all edges between vertices in different parts considered to be deleted? – j_random_hacker Sep 21 '15 at 11:05
  • 2
    Also I'm not sure that p=1 is equivalent to minimum cut: p=1 implies all paths between all vertex pairs must be destroyed, implying *all* edges must be deleted. Even if you only meant "paths between vertices in different parts of the bipartition", min-cut still doesn't necessarily return a bipartition (A, B) that minimises the "minimum-cardinality set" (which I take here to be min(|A|, |B|)). – j_random_hacker Sep 21 '15 at 11:08
  • @j_random_hacker Well the second scenario would actually make sense. From what I understand, OP wants to minimize the number of cut vertices, not the cardinality of one of the partitions. Of course we don't know if that's actually the scenario OP is interested in – Niklas B. Sep 21 '15 at 11:20
  • In any given solution, how can we verify that *the number of removed paths exceeds (p-fraction of paths)* ? Isnt that a hard problem in itself? – A.S.H Sep 21 '15 at 11:43
  • In other words, consider the decision problem: given a DAG, a source s, a sink t, and a set of vertices, does removing this set of vertices eliminate a fraction p of paths from s to t? – A.S.H Sep 21 '15 at 11:49
  • @A.S.H Since we're talking about a DAG, the decision problem has a linear-time dynamic program. – David Eisenstat Sep 21 '15 at 11:50
  • @DavidEisenstat I did not know that. Althoguh it sounds intuitively correct, do you have a reference? Thanks :) – A.S.H Sep 21 '15 at 11:53
  • @A.S.H I think it's probably folklore. Here's an explanation on Stack Overflow: http://stackoverflow.com/questions/5164719/number-of-paths-between-two-nodes-in-a-dag – David Eisenstat Sep 21 '15 at 11:57
  • Thanks. I probably found a O(N) solution to the actual problem based on the method in that thread :) – A.S.H Sep 21 '15 at 12:11
  • UPDATE: Sorry for incomplete description. The nodes with zero in-degree are sources and nodes with zero out-degree are sinks (which can be simplified to s-t case). By paths, I meant all s-t paths for all pairs of s's and t's. Also for checking how many paths are removed, suppose we have indexed each vertex (and keep it updated if needed) so that we know how many paths get disconnected by their removal. Please do not worry for the complexity of the index being updated. – Dant3 Sep 21 '15 at 12:23
  • So are you looking to delete a set of *vertices*? The part about min-cut led me to think that you were looking to delete a set of *edges*. – j_random_hacker Sep 21 '15 at 12:26
  • @j_random_hacker I guess the two problems (deleting vertices or edges) can be easily converted to each other. But in this case, I'm after deleting the vertices. – Dant3 Sep 21 '15 at 12:31

2 Answers2

1

Here's an idea for getting approximately optimal solutions.

There's a variant of set cover that I want to call partial set cover where we have sets and elements and want the minimum number of sets whose union contains a p-fraction of elements. To apply to this problem, sets correspond to nodes, and elements correspond to maximal paths. (Yes, there are too many paths to do this naively; see below.) A set contains an element if and only if the node is contained in the path.

It's not too hard to write partial set cover as an integer program.

minimize sum_{sets S} x_S
subject to
for all elements e, sum_{sets S containing e} x_S >= y_e
sum_{elements e} y_e >= p * number of elements
for all sets S, x_S in {0, 1}      # 1 if set S is chosen
for all elements e, y_e in [0, 1]  # 1 if element e is covered

This program can be solved by an integer program solver. The solvers are surprisingly good, though they of course cannot promise optimal solutions to this NP-hard set cover generalization.

In an interesting DAG, there are of course far too many paths to be enumerated. Sampling to the rescue! Since it's easy to count maximal paths in DAGs, it's easy to sample them uniformly at random. Sample a number of paths and use these as the elements in the integer program.

The tradeoff is that with more paths, the estimate of the objective is better, but the time to solve the integer program is worse. Hoeffding's inequality gives some hint as to the proper choice of the number of samples. With n vertices, there are 2^n possible solutions. We want the estimated fraction for each of these to be accurate to within some epsilon. Hoeffding says that we need to choose the number of samples to be Theta(n/epsilon^2) so that, almost all of the time, all of the estimates are approximately correct. I'd work out the exact constant, but I doubt that it's practically relevant.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • Very nice answer! However, I guess this is still a heuristic, although a good one. Btw, when you say approximate, do you mean any of the results for partial set cover are also true for this thread's problem? – Dant3 Sep 21 '15 at 14:18
  • @Dant3 If you solved the partial set cover program with all paths to optimality, then you'd have an optimal answer to your problem. There is approximation error from sampling, and approximation error from the solver if it can't solve the program to optimality. – David Eisenstat Sep 21 '15 at 14:20
  • Right. But then again, for example, we can not deduce NP-hardness because what is shown is a reduction **to** partial set cover problem, not a reduction **from** partial set cover, right? i.e. we can not deduce any complexity or approximation ratio from it. – Dant3 Sep 21 '15 at 14:22
  • @Dant3 Yes, we're reducing *to* partial set cover, so no hardness implications for your problem. My intuition, though, is that yours is inapproximable too. – David Eisenstat Sep 21 '15 at 14:23
0

EDIT1: after seeing the OP's update, this solution applies to the case of one source u and one sink v.

EDIT2: this is actually a heuristic, see the counter-example in the comments below.

This is a O(V+E) solution based on the method of counting paths provided in the following thread (pointed out by David Eisenstat, thanks) :

Number of paths between two nodes in a DAG

In a first phase, we will apply exactly the “backward” method suggested by stalker. At the end of this phase, we will obtain and store the following information:

  • For each vertex i, the number F(i, v) of paths from i to v

  • F(u, v), the number of paths from source u to sink v.

In a second phase, we proceed forward with the same method: we simulate the algorithm as if the question was to find the “backward paths” from v to u. At the end, we obtain:

  • For each vertex i, the number B(i, u) of backward paths from i to u. Obviously B(i, u) = F(u, i)

  • B(v, u) which is equal to F(u, v).

In the third phase, we compute for each vertex i, the number P(i) = F(u, i) * F(i, v). It is easy to prove that the number of (u, v) paths that traverse i is P(i). Therefore, removing the vertex i results in removing P(i) paths.

In the fourth and last phase, we proceed in a greedy way: remove the vertex that has the highest P(i) until the total number of removed paths exceeds p*F(u, v)

The overall algorithm is O(V+E), since:

  • According to the reference post, the first two phases are O(V+E).

  • The third and fourth phases are obviously O(V), hence O(V+E)

Community
  • 1
  • 1
A.S.H
  • 29,101
  • 5
  • 23
  • 50
  • Thanks, I guess this is a very straight-forward algorithm. However, do you have a guarantee for the answer? I mean can you prove that it returns the optimal answer? – Dant3 Sep 21 '15 at 12:43
  • I guess so. In fact, the greedy approach of the fourth phase should guarantee that the number of removed vertices should be minimal. The method of calculating P(i) is based on the post that I make reference to. – A.S.H Sep 21 '15 at 12:44
  • There is no optimisation in the first 3 phases, just a matter of correctness, if we admit the result of that post's answers. In the fourth phase, the greedy algorithm for finding the minimal cardinality set of vertices is guaranteed to be optimal. It might be not a unique solution, though. – A.S.H Sep 21 '15 at 12:53
  • 2
    A decent heuristic, but the greedy algorithm is not optimal in general. Consider the graph {sa, sb, sc, se, ad, bd, cf, df, dg, eg, ft, gt}. There are 6 s-t paths in total, with 4 passing through d and only 3 passing through each of f and g. So if p=1 (i.e. we want to disconnect s from t), then your algorithm would first choose d -- but then it requires 2 more deletions (namely, f and g -- for a total of 3 deletions) to disconnect s from t. OTOH if it had chosen f and g at the outset it could have disconnected s from t with just 2 deletions. – j_random_hacker Sep 21 '15 at 12:55
  • 1
    @j_random_hacker I verified you counter-example, you are right. This should be labelled a heuristic for instance. I will see if it can be modified to deliver a complete solution. – A.S.H Sep 21 '15 at 13:01
  • @j_random_hacker Thanks for the counter example. This might be a trivial question, but I didn't understand your notation. Can you just express it in G=(V,E) terms and mention sources & sinks? – Dant3 Sep 21 '15 at 13:03
  • @Dant3: There are 9 vertices (s, t, a, b, c, d, e, f, g), with s the source and t the sink, and I just listed the set of 12 directed edges in the graph. Ask if something's still not clear. – j_random_hacker Sep 21 '15 at 13:05
  • @j_random_hacker Thanks, got it! – Dant3 Sep 21 '15 at 13:06
  • @i_random_hacker the trick is that after removing the vertex with highest P the P's of the other vertices get affected. I realized it only now, unfortunately. – A.S.H Sep 21 '15 at 13:12
  • @A.S.H: Yeah, I think that's the problem -- we can't work with just *counts* of paths, since the paths killed by one vertex can overlap in seemingly arbitrary ways with the paths killed by another, and we don't want to count those shared paths twice. :-( – j_random_hacker Sep 21 '15 at 13:14
  • 1
    You can reduce the multiple source/sink problem to the single source/sink case by just adding a super-source and super-sink and connecting them to the set of sources/sinks, respectively – Niklas B. Sep 21 '15 at 14:02
  • Of course then we have to make sure that neither the super source nor the super sink are cut, but you can easily modify your algorithm to ensure that – Niklas B. Sep 21 '15 at 14:15