6

I have a list of vectors, say:

li <- list( c(1, 2, 3),
            c(1, 2, 3, 4),
            c(2, 3, 4),
            c(5, 6, 7, 8, 9, 10, 11, 12),
            numeric(0),
            c(5, 6, 7, 8, 9, 10, 11, 12, 13)
            )

And I would like to remove all the vectors that are already contained in others (bigger or equal), as well as all the empty vectors

In this case, I would be left with only the list

1 2 3 4
5  6  7  8  9 10 11 12 13

Is there any useful function for achieving this?

Thanks in advance

Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
Ruggero
  • 291
  • 2
  • 11
  • Take a look at [this answer](http://stackoverflow.com/a/27521122/3521006) - I think it will do what you're looking for (just put your vectors in a list) – talat Jun 16 '15 at 13:37
  • Actually, what I would like to achieve is something different: first of all, I would get rid of empty vectors. Secondly, I would like to remove vectors already contained in others... – Ruggero Jun 16 '15 at 13:46

2 Answers2

2

First you should sort the list by vector length, such that in the excision loop it is guaranteed that each lower-index vector is shorter than each higher-index vector, so a one-way setdiff() is all you need.

l <- list(1:3, 1:4, 2:4, 5:12, double(), 5:13 );
ls <- l[order(sapply(l,length))];
i <- 1; while (i <= length(ls)-1) if (length(ls[[i]]) == 0 || any(sapply((i+1):length(ls),function(i2) length(setdiff(ls[[i]],ls[[i2]]))) == 0)) ls[[i]] <- NULL else i <- i+1;
ls;
## [[1]]
## [1] 1 2 3 4
##
## [[2]]
## [1]  5  6  7  8  9 10 11 12 13

Here's a slight alternative, replacing the any(sapply(...)) with a second while-loop. The advantage is that the while-loop can break prematurely if it finds any superset in the remainder of the list.

l <- list(1:3, 1:4, 2:4, 5:12, double(), 5:13 );
ls <- l[order(sapply(l,length))];
i <- 1; while (i <= length(ls)-1) if (length(ls[[i]]) == 0 || { j <- i+1; res <- F; while (j <= length(ls)) if (length(setdiff(ls[[i]],ls[[j]])) == 0) { res <- T; break; } else j <- j+1; res; }) ls[[i]] <- NULL else i <- i+1;
ls;
## [[1]]
## [1] 1 2 3 4
##
## [[2]]
## [1]  5  6  7  8  9 10 11 12 13
bgoldst
  • 34,190
  • 6
  • 38
  • 64
0

x is contained in y if

length(setdiff(x, y)) == 0

You can apply it to each pair of vectors using functions like expand.grid or combn.

Michele Usuelli
  • 1,970
  • 13
  • 15
  • Wouldn't this approach be unnecessarily repetitive? If you try all combinations than you'd be checking a removable element several times, you only need to do it once. – Molx Jun 16 '15 at 14:09