1

I have a list containing several vectors, e.g.:

ls=list(c("g1","g3","g6"),c("g1","g4"),c("g2","g5"),c("g2","g5"),c("g2"))

I want to capture the minimum number of elements so that I have at least one element from each vector.

So in this example, "g1" and "g2" because g1 captures vectors 1 and 2 and g2 captures vectors 1, 3, 4 and 5.

I've been looking at How to find common elements from multiple vectors? but it isn't quite the same question.

Community
  • 1
  • 1
Jessica B
  • 321
  • 1
  • 4
  • 15
  • 1
    Two points here. I'd think about converting this to values in a matrix G, containing each vector in a row (with NA for truncated values). Second, my approach here would be to find the single value that covers the most rows, put it aside (and exclude it), then to find the second value with greatest coverage etc. In this way you would have the coverage ranking for all elements. From this ranking, find the element at which total coverage is 100% and cut it there. I don't think there are specific functions, something similar is in `TraMineR` function `seqefsub`, subject to some adaptation. – Maxim.K Apr 26 '13 at 11:43

1 Answers1

1

Brute force:

ls <- list(c("g1","g3","g6"),c("g1","g4"),c("g2","g5"),c("g2","g5"),c("g2"))
#unique values:
vals <- unique(do.call(c,ls))
#matrix indicating in which list each value is present
valsin <- sapply(ls,function(x) as.integer(vals %in% x))
rownames(valsin) <- vals

#loop through numbers of values to take for combinations
for (i in seq_along(vals)) {

    cat(paste0(i,"\n"))
    #Do combinations fullfill condition?
    suff <- combn(seq_along(vals),i,FUN=function(x) {
      identical(colSums(valsin[x,,drop=FALSE]),rep(1,length(ls)))
    })
    if (sum(suff) > 0) {
      #combinations that fullfill condition
      res <- combn(vals,i)[,suff]
      #stop loop if condition has been fullfilled
      break
    }

}

res
#[1] "g1" "g2"
Roland
  • 127,288
  • 10
  • 191
  • 288