Variation on set cover problem in R / C++

Question

Given a universe of elements U = {1, 2, 3,...,n} and a number of sets in this universe {S1, S2,...,Sm}, what is the smallest set we can create that will cover at least one element in each of the m sets?

For example, given the following elements U = {1,2,3,4} and sets S = {{4,3,1},{3,1},{4}}, the following sets will cover at least one element from each set: {1,4} or {3,4} so the minimum sized set required here is 2.

Any thoughts on how this can be scaled up to solve the problem for m=100 or m=1000 sets? Or thoughts on how to code this up in R or C++?

The sample data, from above, using R's library(sets).

s1 <- set(4, 3, 1)
s2 <- set(3, 1)
s3 <- set(4)
s <- set(s1, s2, s3)

Cheers

Did you mean n=4 and m=100? Since per your definition, `m` is the number of sets, `n` is the number of elements! — Tommy, Jul 19 '11 at 17:22

bar · Answer 1 · 2011-07-20T02:25:57.333

This is the hitting set problem, which is basically set cover with the roles of elements and sets interchanged. Letting A = {4, 3, 1} and B = {3, 1} and C = {4}, the element-set containment relation is

  A B C
1 + + -
2 - - -
3 + + -
4 + - +

so you basically want to solve the problem of covering {A, B, C} with sets 1 = {A, B} and 2 = {} and 3 = {A, B} and 4 = {A, C}.

Probably the easiest way to solve nontrivial instances of set cover in practice is to find an integer programming package with an interface to R or C++. Your example would be rendered as the following integer program, in LP format.

Minimize
    obj: x1 + x2 + x3 + x4
Subject To
    A: x1 + x3 + x4 >= 1
    B: x1 + x3 >= 1
    C: x4 >= 1
Binary
    x1 x2 x3 x4
End

hi @bar, this looks great. Trouble is (and I'm coming out of the closet here) I'm not familiar with LP! Seem like I have some weekend reading to do! :-) — jedfrancis, Jul 23 '11 at 08:32

Tommy · Answer 2 · 2011-07-20T16:00:36.927

At first I misunderstood the complexity of the problem and came up with a function that finds a set that covers the m sets - but I then realized that it isn't necessarily the smallest one:

cover <- function(sets, elements = NULL) {
  if (is.null(elements)) {
    # Build the union of all sets
    su <- integer() 
    for(si in sets) su <- union(su, si)
  } else {
    su <- elements
  }

  s <- su
  for(i in seq_along(s)) {
    # create set candidate with one element removed
    sc <- s[-i] 

    ok <- TRUE
    for(si in sets) {
      if (!any(match(si, sc, nomatch=0L))) {
        ok <- FALSE
        break
      }
    }

    if (ok) {
      s <- sc
    }
  }

  # The resulting set
  s
}

sets <- list(s1=c(1,3,4), s2=c(1,3), s3=c(4))
> cover(sets) # [1] 3 4

Then we can time it:

n <- 100  # number of elements
m <- 1000 # number of sets
sets <- lapply(seq_len(m), function(i) sample.int(n, runif(1, 1, n)))
system.time( s <- cover(sets) ) # 0.53 seconds

Not too bad, but still not the smallest one.

The obvious solution: generate all permutations of elements and pass is to the cover function and keep the smallest result. This will take close to "forever".

Another approach is to generate a limited number of random permutations - this way you get an approximation of the smallest set.

ns <- 10 # number of samples
elements <- seq_len(n)
smin <- sets
for(i in seq_len(ns)) {
   s <- cover(sets, sample(elements))
   if (length(s) < length(smin)) {
     smin <- s
   }
}
length(smin) # approximate smallest length

thanks for this reply, it provides a good basis for starting to coding up a solution. — jedfrancis, Jul 23 '11 at 08:28

score 1 · Answer 3 · answered Jul 19 '11 at 16:51

1

If you restrict each set to have 2 elements, you have the np-complete problem node cover. I would guess the more general problem would also be NP complete (for the exact version).

answered Jul 19 '11 at 16:51

Foo Bah

25,660
5
55
79

hi, the elements in each set are important in the specific problem I'm solving so cannot be reduced to 2 elements. – jedfrancis Jul 19 '11 at 16:55
Sure it is. If the 2-set version is NP-complete, it is NP-hard too. Since you can trivially solve instances of the 2-set version with the N-set version (use the N-set version with N=2), it is NP-hard as well. Since you can easily verify a certificate in polynomial time (check the intersection with each set in S), it is also in NP. Hence NP-complete. – Patrick87 Jul 19 '11 at 16:57

score 1 · Answer 4 · answered Jul 19 '11 at 16:53

1

If you're just interested in an algorithm (rather than an efficient/feasible algorithm), you can simply generate subsets of the universe of increasing cardinality and check that the intersection with all the sets in S is non-empty. Stop as soon as you get one that works; the cardinality is the minimum possible.

The complexity of this is 2^|U| in the worst case, I think. Given Foo Bah's answer, I don't think you're going to get a polynomial-time answer...

answered Jul 19 '11 at 16:53

Patrick87

27,682
3
38
73

I'll give this a try, but ideally I'm looking for a feasible and efficient algorithm that can scale to a very large number of sets with a large number of elements in each set. – jedfrancis Jul 19 '11 at 17:26

Variation on set cover problem in R / C++

4 Answers4